From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=EZRb=D6=kvack.org=owner-linux-mm@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-14.3 required=3.0 tests=BAYES_00,DKIMWL_WL_MED,
	DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,
	MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,
	USER_IN_DEF_DKIM_WL autolearn=no autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 8B560C4363A
	for <linux-mm@archiver.kernel.org>; Fri, 23 Oct 2020 17:39:01 +0000 (UTC)
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by mail.kernel.org (Postfix) with ESMTP id E028A21527
	for <linux-mm@archiver.kernel.org>; Fri, 23 Oct 2020 17:39:00 +0000 (UTC)
Authentication-Results: mail.kernel.org;
	dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="ABgAteEb"
DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org E028A21527
Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com
Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix)
	id E25A96B005D; Fri, 23 Oct 2020 13:38:59 -0400 (EDT)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id DD4BD6B0062; Fri, 23 Oct 2020 13:38:59 -0400 (EDT)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id CC3DD6B0068; Fri, 23 Oct 2020 13:38:59 -0400 (EDT)
X-Delivered-To: linux-mm@kvack.org
Received: from forelay.hostedemail.com (smtprelay0168.hostedemail.com [216.40.44.168])
	by kanga.kvack.org (Postfix) with ESMTP id 9DE076B005D
	for <linux-mm@kvack.org>; Fri, 23 Oct 2020 13:38:59 -0400 (EDT)
Received: from smtpin12.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251])
	by forelay05.hostedemail.com (Postfix) with ESMTP id 2E26A181AC9B6
	for <linux-mm@kvack.org>; Fri, 23 Oct 2020 17:38:59 +0000 (UTC)
X-FDA: 77403900798.12.end85_1a06c5b2725b
Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251])
	by smtpin12.hostedemail.com (Postfix) with ESMTP id 0895A180192EF
	for <linux-mm@kvack.org>; Fri, 23 Oct 2020 17:38:59 +0000 (UTC)
X-HE-Tag: end85_1a06c5b2725b
X-Filterd-Recvd-Size: 8415
Received: from mail-pf1-f194.google.com (mail-pf1-f194.google.com [209.85.210.194])
	by imf02.hostedemail.com (Postfix) with ESMTP
	for <linux-mm@kvack.org>; Fri, 23 Oct 2020 17:38:58 +0000 (UTC)
Received: by mail-pf1-f194.google.com with SMTP id 126so1475939pfu.4
        for <linux-mm@kvack.org>; Fri, 23 Oct 2020 10:38:58 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20161025;
        h=mime-version:references:in-reply-to:from:date:message-id:subject:to
         :cc;
        bh=2DJUxnPz1qfNMYLpAt5xVlprmZzzkEHM3xpAbSqH2VY=;
        b=ABgAteEbdhDZZq55hFyGeQUSwkHVdOlVv4fM5TK99e5OtJik2dv4uTciF9UZ1JV4yF
         xnrp/yjAeqrqfIZVEXaL6Og9I0jY+yFr8s8xRYQYk7UUnFGoMbwiKd0XNcMkMKr2Lthq
         JzCNC0bNRO1UJrfuCpUsZRNiS0C17oFOYQzhvQwsbZ7oAK8BvHN2e5dtEHRD1PCjOFZ1
         3d2KaPH77QFOBymGkPye53fkMHRKu+VgwwFWClr9vSIzA2Gw2wEuOmssvcXKJoZ3gP5E
         h1t5Q7xdwrB8K2nJHH4QMBggN59MeWUBytcOVkyhqgzK1ziTQo2F3lFoTBsra1pdXFID
         qe8w==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20161025;
        h=x-gm-message-state:mime-version:references:in-reply-to:from:date
         :message-id:subject:to:cc;
        bh=2DJUxnPz1qfNMYLpAt5xVlprmZzzkEHM3xpAbSqH2VY=;
        b=cDy1SSuNsFvb74Pstw4IcyyDFcEh5wYGnJHzqpB3rtedhuIf9QcwUKKpb9EAnLDHTz
         ssWe1t/6psbYtIDdVsAp7OJfgvd/W/BBHbuM0ywsg2k2cPEcDH6hQbmR3/YIDVMvJcXB
         fhev70ajBnHzkCG3QlsYU6xH13qDIVLAS0C7SaWxsdU4ZXLDygplIbceq9ZnTkPxAJDv
         XXPtdlJ4TpJLzFDOX5PGmirQJJTsqzM1H5+g29M9wSJSyBl1pGqDOWqZEXBc9eHdFUe1
         A+3zgrGurpRil2RHAjyw/4ImtVX8jKdoaIqf9VZWNOMGUFo9L4JPmUDHJsw+qLaLQ/PB
         eYvw==
X-Gm-Message-State: AOAM531yBdJumJ5QnvcLQSl91t9A7XFxjwnECsrBG/6kGXLCthnl5okA
	+ZsPFowogQmE8RFHs+13tATvFyDJ9dtAhklQmXDzTw==
X-Google-Smtp-Source: ABdhPJzTAVHPon+h+Q7bqsEe8mbsZYqQUKGTojz0v5PDtYXqIlLdgBAkk97u5UkEmog6rAtficaIPgO1q4MxNWa4nzw=
X-Received: by 2002:a63:78c3:: with SMTP id t186mr3027369pgc.12.1603474737279;
 Fri, 23 Oct 2020 10:38:57 -0700 (PDT)
MIME-Version: 1.0
References: <20201020184746.300555-1-axelrasmussen@google.com>
 <20201020184746.300555-2-axelrasmussen@google.com> <fa6b9d13-0ef5-4d5d-bda3-657300028e23@suse.cz>
In-Reply-To: <fa6b9d13-0ef5-4d5d-bda3-657300028e23@suse.cz>
From: Axel Rasmussen <axelrasmussen@google.com>
Date: Fri, 23 Oct 2020 10:38:20 -0700
Message-ID: <CAJHvVcjzZgsvdzciR5v_wkgf3M7aD_vNGv3TXrf5Z5K6SLprSA@mail.gmail.com>
Subject: Re: [PATCH v4 1/1] mmap_lock: add tracepoints around lock acquisition
To: Vlastimil Babka <vbabka@suse.cz>
Cc: Steven Rostedt <rostedt@goodmis.org>, Ingo Molnar <mingo@redhat.com>, 
	Andrew Morton <akpm@linux-foundation.org>, Michel Lespinasse <walken@google.com>, 
	Daniel Jordan <daniel.m.jordan@oracle.com>, Jann Horn <jannh@google.com>, 
	Chinwen Chang <chinwen.chang@mediatek.com>, Davidlohr Bueso <dbueso@suse.de>, 
	David Rientjes <rientjes@google.com>, Yafang Shao <laoar.shao@gmail.com>, 
	LKML <linux-kernel@vger.kernel.org>, Linux MM <linux-mm@kvack.org>
Content-Type: text/plain; charset="UTF-8"
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>

On Fri, Oct 23, 2020 at 7:00 AM Vlastimil Babka <vbabka@suse.cz> wrote:
>
> On 10/20/20 8:47 PM, Axel Rasmussen wrote:
> > The goal of these tracepoints is to be able to debug lock contention
> > issues. This lock is acquired on most (all?) mmap / munmap / page fault
> > operations, so a multi-threaded process which does a lot of these can
> > experience significant contention.
> >
> > We trace just before we start acquisition, when the acquisition returns
> > (whether it succeeded or not), and when the lock is released (or
> > downgraded). The events are broken out by lock type (read / write).
> >
> > The events are also broken out by memcg path. For container-based
> > workloads, users often think of several processes in a memcg as a single
> > logical "task", so collecting statistics at this level is useful.
> >
> > The end goal is to get latency information. This isn't directly included
> > in the trace events. Instead, users are expected to compute the time
> > between "start locking" and "acquire returned", using e.g. synthetic
> > events or BPF. The benefit we get from this is simpler code.
> >
> > Because we use tracepoint_enabled() to decide whether or not to trace,
> > this patch has effectively no overhead unless tracepoints are enabled at
> > runtime. If tracepoints are enabled, there is a performance impact, but
> > how much depends on exactly what e.g. the BPF program does.
> >
> > Reviewed-by: Michel Lespinasse <walken@google.com>
> > Acked-by: Yafang Shao <laoar.shao@gmail.com>
> > Acked-by: David Rientjes <rientjes@google.com>
> > Signed-off-by: Axel Rasmussen <axelrasmussen@google.com>
>
> All seem fine to me, except I started to wonder..
>
> > +
> > +#ifdef CONFIG_MEMCG
> > +
> > +DEFINE_PER_CPU(char[MAX_FILTER_STR_VAL], trace_memcg_path);
> > +
> > +/*
> > + * Write the given mm_struct's memcg path to a percpu buffer, and return a
> > + * pointer to it. If the path cannot be determined, the buffer will contain the
> > + * empty string.
> > + *
> > + * Note: buffers are allocated per-cpu to avoid locking, so preemption must be
> > + * disabled by the caller before calling us, and re-enabled only after the
> > + * caller is done with the pointer.
>
> Is this enough? What if we fill the buffer and then an interrupt comes and the
> handler calls here again? We overwrite the buffer and potentially report a wrong
> cgroup after the execution resumes?
> If nothing worse can happen (are interrupts disabled while the ftrace code is
> copying from the buffer?), then it's probably ok?

I think you're right, get_cpu()/put_cpu() only deals with preemption,
not interrupts.

I'm somewhat sure this code can be called in interrupt context, so I
don't think we can use locks to prevent this situation. I think it
works like this: say we acquire the lock, an interrupt happens, and
then we try to acquire again on the same CPU; we can't sleep, so we're
stuck.

I think we can't kmalloc here (instead of a percpu buffer) either,
since I would guess that kmalloc may also acquire mmap_lock itself?

Is adding local_irq_save()/local_irq_restore() in addition to
get_cpu()/put_cpu() sufficient?

>
> > + */
> > +static const char *get_mm_memcg_path(struct mm_struct *mm)
> > +{
> > +     struct mem_cgroup *memcg = get_mem_cgroup_from_mm(mm);
> > +
> > +     if (memcg != NULL && likely(memcg->css.cgroup != NULL)) {
> > +             char *buf = this_cpu_ptr(trace_memcg_path);
> > +
> > +             cgroup_path(memcg->css.cgroup, buf, MAX_FILTER_STR_VAL);
> > +             return buf;
> > +     }
> > +     return "";
> > +}
> > +
> > +#define TRACE_MMAP_LOCK_EVENT(type, mm, ...)                                   \
> > +     do {                                                                   \
> > +             get_cpu();                                                     \
> > +             trace_mmap_lock_##type(mm, get_mm_memcg_path(mm),              \
> > +                                    ##__VA_ARGS__);                         \
> > +             put_cpu();                                                     \
> > +     } while (0)
> > +
> > +#else /* !CONFIG_MEMCG */
> > +
> > +#define TRACE_MMAP_LOCK_EVENT(type, mm, ...)                                   \
> > +     trace_mmap_lock_##type(mm, "", ##__VA_ARGS__)
> > +
> > +#endif /* CONFIG_MEMCG */
> > +
> > +/*
> > + * Trace calls must be in a separate file, as otherwise there's a circular
> > + * dependency between linux/mmap_lock.h and trace/events/mmap_lock.h.
> > + */
> > +
> > +void __mmap_lock_do_trace_start_locking(struct mm_struct *mm, bool write)
> > +{
> > +     TRACE_MMAP_LOCK_EVENT(start_locking, mm, write);
> > +}
> > +EXPORT_SYMBOL(__mmap_lock_do_trace_start_locking);
> > +
> > +void __mmap_lock_do_trace_acquire_returned(struct mm_struct *mm, bool write,
> > +                                        bool success)
> > +{
> > +     TRACE_MMAP_LOCK_EVENT(acquire_returned, mm, write, success);
> > +}
> > +EXPORT_SYMBOL(__mmap_lock_do_trace_acquire_returned);
> > +
> > +void __mmap_lock_do_trace_released(struct mm_struct *mm, bool write)
> > +{
> > +     TRACE_MMAP_LOCK_EVENT(released, mm, write);
> > +}
> > +EXPORT_SYMBOL(__mmap_lock_do_trace_released);
> >
>