From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.3 required=3.0 tests=DKIMWL_WL_MED,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,HTML_MESSAGE, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_IN_DEF_DKIM_WL autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 59ADBC433E0 for ; Fri, 29 May 2020 01:39:50 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id E830220723 for ; Fri, 29 May 2020 01:39:49 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="fNVWsM/z" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org E830220723 Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 6ED108001A; Thu, 28 May 2020 21:39:49 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 6785D80010; Thu, 28 May 2020 21:39:49 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5180E8001A; Thu, 28 May 2020 21:39:49 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0057.hostedemail.com [216.40.44.57]) by kanga.kvack.org (Postfix) with ESMTP id 356C180010 for ; Thu, 28 May 2020 21:39:49 -0400 (EDT) Received: from smtpin11.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id DE4D8181AEF1E for ; Fri, 29 May 2020 01:39:48 +0000 (UTC) X-FDA: 76868050056.11.flag06_387f99f5ff943 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin11.hostedemail.com (Postfix) with ESMTP id C0874180F8B80 for ; Fri, 29 May 2020 01:39:48 +0000 (UTC) X-HE-Tag: flag06_387f99f5ff943 X-Filterd-Recvd-Size: 17227 Received: from mail-lf1-f67.google.com (mail-lf1-f67.google.com [209.85.167.67]) by imf50.hostedemail.com (Postfix) with ESMTP for ; Fri, 29 May 2020 01:39:48 +0000 (UTC) Received: by mail-lf1-f67.google.com with SMTP id d7so297942lfi.12 for ; Thu, 28 May 2020 18:39:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=3nAFlqyEUmAGcdOEMDnSSz71NCUMuG6LIBc2hsSYk20=; b=fNVWsM/zPzDy9BGpvBkqxeAHCkQG3fUF+qq+En9Je6HIclXpFuFqaA3bQEVApUYzVy ixh0rYLVGIC4pPoNOk2wSTOLl8mQnoZ2BnbZm+LvifbA69aHwtOe7k8HQp3fV1hnJOyN iwWxPdvK/XJfatV9FJvEMn/GUkiTY/YCXGtClcIpsMLcMTL3rpc2tPBttOmpENhMUHOb 8RCkExCvdW/pmSKOTfPUNFSG+uu7cYFPIx+heDf6jjGMRdNGv7O46wV6treH9MD0VVm2 Kj49RQy5FAdW/gkJ3mkVOOo1CEykYgK+qQS4sy4z67wKnpQ/RFvUre09szrnl5Gs5ZIA oMAg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=3nAFlqyEUmAGcdOEMDnSSz71NCUMuG6LIBc2hsSYk20=; b=sTyUxr5YcTKcEkmcOwINDgiTpAaJvsHsZi3chsOUF7QsJtUn9UGZoEY8UN9rl++zX2 jwuJrKKuPjACtNqKkXoVK8LXTlwcLPRD2Pjst/Nkn1bJs0xTiRMKfxzhuneQ4a4x4a8G 7xv2dqEiSZNuyr/aKRb3aLosju3zQu0wsITgB1Y+6wLEkWiLGenZ9KWgZv8XRxbbqFdS ZAsbpRyftHdR4oQn/8KWJ/ZBNYLDv+vuYYVwzAIClLPL424jTVT9wdNUjYuPHRKwcxg9 wchDFu199pnchACqzImSgJDxNpbSEMe2K4N/Rsy1bRLFG18KS5SuQ0iI8vpfWn8fqEex GgWA== X-Gm-Message-State: AOAM531kNCPt70zRqxxd1XpaMOA+qRSeQ3nPJlmqoFQCz4CayoUIqDf8 dfyw34XcvjexTx+HVmxFT7nbtsq+0WpIRG8nztIhLw== X-Google-Smtp-Source: ABdhPJwkYW1InQE9pYoiOfQPj61B8x+wylF+txbBIAOc22K5aylzJbg8ijQ7bBNfyubHJkx6nntj6leZe/yPSTy2FSU= X-Received: by 2002:ac2:4562:: with SMTP id k2mr3151107lfm.5.1590716386400; Thu, 28 May 2020 18:39:46 -0700 (PDT) MIME-Version: 1.0 References: <20200528235238.74233-1-axelrasmussen@google.com> <20200528202435.65396221@oasis.local.home> In-Reply-To: <20200528202435.65396221@oasis.local.home> From: Axel Rasmussen Date: Thu, 28 May 2020 18:39:08 -0700 Message-ID: Subject: Re: [PATCH v2 0/7] Add histogram measuring mmap_lock contention latency To: Steven Rostedt Cc: Andrew Morton , David Rientjes , Davidlohr Bueso , Ingo Molnar , Ingo Molnar , Jerome Glisse , Laurent Dufour , "Liam R . Howlett" , Matthew Wilcox , Michel Lespinasse , Peter Zijlstra , Vlastimil Babka , Will Deacon , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, AKASHI Takahiro , Aleksa Sarai , Alexander Potapenko , Alexey Dobriyan , Al Viro , Andrei Vagin , Ard Biesheuvel , Brendan Higgins , chenqiwu , Christian Brauner , Christian Kellner , Corentin Labbe , Daniel Jordan , Dan Williams , David Gow , "David S. Miller" , "Dmitry V. Levin" , "Eric W. Biederman" , Eugene Syromiatnikov , Jamie Liu , Jason Gunthorpe , John Garry , John Hubbard , Jonathan Adams , Junaid Shahid , Kees Cook , "Kirill A. Shutemov" , Konstantin Khlebnikov , Krzysztof Kozlowski , Mark Rutland , Masahiro Yamada , Masami Hiramatsu , Mathieu Desnoyers , Michal Hocko , Mikhail Zaslonko , Petr Mladek , Ralph Campbell , Randy Dunlap , Roman Gushchin , Shakeel Butt , Tal Gilboa , Thomas Gleixner , =?UTF-8?Q?Uwe_Kleine=2DK=C3=B6nig?= , Vincenzo Frascino , Yang Shi , Yu Zhao , Tom Zanussi Content-Type: multipart/alternative; boundary="00000000000026c63e05a6bf857b" X-Rspamd-Queue-Id: C0874180F8B80 X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam05 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: --00000000000026c63e05a6bf857b Content-Type: text/plain; charset="UTF-8" On Thu, May 28, 2020 at 5:24 PM Steven Rostedt wrote: > On Thu, 28 May 2020 16:52:38 -0700 > Axel Rasmussen wrote: > > Hi Axel, > > First, your patch threading is messed up. All the patches should be a > reply to this cover page, and not individual emails which get lost > among other patches. > Sorry - I'll fix this in the future. (I mistakenly ran git send-email on each patch individually, rather than globbing them into a single command.) > > Next, we already have histogram logic with trace events. Why not build > off of that. Or perhaps update lockdep which can record contention with > all locks. Why create yet another histogram infrastructure that is used > for just a specific purpose? > The use case we have in mind for this is to enable this instrumentation widely in Google's production fleet. Internally, we have a userspace thing which scrapes these metrics and publishes them such that we can look at aggregate metrics across our fleet. The thinking is that mechanisms like lockdep or getting histograms with e.g. BPF attached to the tracepoint introduces too much overhead for this to be viable. (Although, granted, I don't have benchmarks to prove this - if there's skepticism, I can produce such a thing - or prove myself wrong and rethink my approach. :) ) > > -- Steve > > > > The overall goal of this patchset is to add a latency histogram which > measures > > `mmap_lock` acquisition time. This is useful to measure the impact of > ongoing > > work like maple trees and range locks (https://lwn.net/Articles/787629/), > and > > it is also useful to debug userspace processes which experience long > waits due > > to lock contention. > > > > This patchset is built upon walken@google.com's new `mmap_lock` API > > (https://lkml.org/lkml/2020/4/21/1307). In its current form, it should > apply > > cleanly to a 5.7-rc7 tree to which Michel's patchset has already been > applied. > > > > To summarize the changes being made at a high level: > > > > - Add a histogram library: a `struct histogram` is effectively an array > of > > thresholds (i.e., buckets), and an array of per-cpu `u64` counts of the > > number of samples in each bucket. > > > > - Modify Michel's mmap_lock API to record samples in a histogram, owned > by the > > `mm_struct`, on each lock acquisition. For contended lock acquisitions, > we > > compute the amount of time spent waiting, which determines the bucket. > > > > - For uncontended cases, we still record a sample, but with "0" latency. > The > > reasoning for this is, a) we don't want to incur the overhead of > actually > > measuring the time, but b) we still want to end up with an accurate > count of > > acquisition attempts, as this lets us compute latency percentiles > (e.g., "x% > > of lock acquisitions completed in <= y ns"). > > > > Changes since v1 (sent to a few folks within Google for initial review): > > > > - Added a tracepoint to the contended case. > > - Modified `mmap_write_lock_nested` to split the {un,}contended cases. > > - Removed support for having more than one histogram in `mm_struct`. > > - Removed any histogram code not explicitly used in this patchset. > > - Whitespace cleanups. > > > > Axel Rasmussen (7): > > histogram: add struct histogram > > histogram: add helper function to expose histograms to userspace > > mmap_lock: add a histogram structure to struct mm_struct > > mmap_lock: allocate histogram (if enabled) in mm_init > > mmap_lock: add /proc//mmap_lock_contention interface > > mmap_lock: increment histogram whenever mmap_lock is acquired > > mmap_lock: add a tracepoint to contended acquisitions > > > > fs/proc/base.c | 25 +++ > > include/linux/histogram.h | 293 +++++++++++++++++++++++++++++++ > > include/linux/mm_types.h | 11 ++ > > include/linux/mmap_lock.h | 92 +++++++++- > > include/trace/events/mmap_lock.h | 34 ++++ > > kernel/fork.c | 55 ++++++ > > kernel/locking/rwsem.c | 4 +- > > lib/Kconfig | 3 + > > lib/Makefile | 2 + > > lib/histogram.c | 212 ++++++++++++++++++++++ > > mm/Kconfig | 13 ++ > > mm/Makefile | 1 + > > mm/mmap_lock.c | 46 +++++ > > 13 files changed, 782 insertions(+), 9 deletions(-) > > create mode 100644 include/linux/histogram.h > > create mode 100644 include/trace/events/mmap_lock.h > > create mode 100644 lib/histogram.c > > create mode 100644 mm/mmap_lock.c > > > > -- > > 2.27.0.rc0.183.gde8f92d652-goog > > --00000000000026c63e05a6bf857b Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable


=
On Thu, May 28, 2020 at 5:24 PM Steve= n Rostedt <rostedt@goodmis.org> wrote:
On = Thu, 28 May 2020 16:52:38 -0700
Axel Rasmussen <
axelrasmussen@google.com> wrote:

Hi Axel,

First, your patch threading is messed up. All the patches should be a
reply to this cover page, and not individual emails which get lost
among other patches.

Sorry - I'll f= ix this in the future. (I mistakenly ran git send-email on each patch indiv= idually, rather than globbing them into a single command.)
=C2=A0=

Next, we already have histogram logic with trace events. Why not build
off of that. Or perhaps update lockdep which can record contention with
all locks. Why create yet another histogram infrastructure that is used
for just a specific purpose?

The use ca= se we have in mind for this is to enable this instrumentation widely in Goo= gle's production fleet. Internally, we have a userspace thing which scr= apes these metrics and publishes them such that we can look at aggregate me= trics across our fleet. The thinking is that mechanisms like lockdep or get= ting histograms with e.g. BPF attached to the tracepoint introduces too muc= h overhead for this to be viable. (Although, granted, I don't have benc= hmarks to prove this - if there's skepticism, I can produce such a thin= g - or prove myself wrong and rethink my approach. :) )
=C2=A0

-- Steve


> The overall goal of this patchset is to add a latency histogram which = measures
> `mmap_lock` acquisition time. This is useful to measure the impact of = ongoing
> work like maple trees and range locks (https://lwn.net/Articles= /787629/), and
> it is also useful to debug userspace processes which experience long w= aits due
> to lock contention.
>
> This patchset is built upon walken@google.com's new `mmap_lock` API
> (https://lkml.org/lkml/2020/4/21/1307). In its current = form, it should apply
> cleanly to a 5.7-rc7 tree to which Michel's patchset has already b= een applied.
>
> To summarize the changes being made at a high level:
>
> - Add a histogram library: a `struct histogram` is effectively an arra= y of
>=C2=A0 =C2=A0thresholds (i.e., buckets), and an array of per-cpu `u64` = counts of the
>=C2=A0 =C2=A0number of samples in each bucket.
>
> - Modify Michel's mmap_lock API to record samples in a histogram, = owned by the
>=C2=A0 `mm_struct`, on each lock acquisition. For contended lock acquis= itions, we
>=C2=A0 compute the amount of time spent waiting, which determines the b= ucket.
>
> - For uncontended cases, we still record a sample, but with "0&qu= ot; latency. The
>=C2=A0 =C2=A0reasoning for this is, a) we don't want to incur the o= verhead of actually
>=C2=A0 =C2=A0measuring the time, but b) we still want to end up with an= accurate count of
>=C2=A0 =C2=A0acquisition attempts, as this lets us compute latency perc= entiles (e.g., "x%
>=C2=A0 =C2=A0of lock acquisitions completed in <=3D y ns").
>
> Changes since v1 (sent to a few folks within Google for initial review= ):
>
> - Added a tracepoint to the contended case.
> - Modified `mmap_write_lock_nested` to split the {un,}contended cases.=
> - Removed support for having more than one histogram in `mm_struct`. > - Removed any histogram code not explicitly used in this patchset.
> - Whitespace cleanups.
>
> Axel Rasmussen (7):
>=C2=A0 =C2=A0histogram: add struct histogram
>=C2=A0 =C2=A0histogram: add helper function to expose histograms to use= rspace
>=C2=A0 =C2=A0mmap_lock: add a histogram structure to struct mm_struct >=C2=A0 =C2=A0mmap_lock: allocate histogram (if enabled) in mm_init
>=C2=A0 =C2=A0mmap_lock: add /proc/<pid>/mmap_lock_contention inte= rface
>=C2=A0 =C2=A0mmap_lock: increment histogram whenever mmap_lock is acqui= red
>=C2=A0 =C2=A0mmap_lock: add a tracepoint to contended acquisitions
>
>=C2=A0 fs/proc/base.c=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0|=C2=A0 25 +++
>=C2=A0 include/linux/histogram.h=C2=A0 =C2=A0 =C2=A0 =C2=A0 | 293 +++++= ++++++++++++++++++++++++++
>=C2=A0 include/linux/mm_types.h=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0|=C2= =A0 11 ++
>=C2=A0 include/linux/mmap_lock.h=C2=A0 =C2=A0 =C2=A0 =C2=A0 |=C2=A0 92 = +++++++++-
>=C2=A0 include/trace/events/mmap_lock.h |=C2=A0 34 ++++
>=C2=A0 kernel/fork.c=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 |=C2=A0 55 ++++++
>=C2=A0 kernel/locking/rwsem.c=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0|= =C2=A0 =C2=A04 +-
>=C2=A0 lib/Kconfig=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 |=C2=A0 =C2=A03 +
>=C2=A0 lib/Makefile=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0|=C2=A0 =C2=A02 +
>=C2=A0 lib/histogram.c=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 | 212 ++++++++++++++++++++++
>=C2=A0 mm/Kconfig=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0|=C2=A0 13 ++
>=C2=A0 mm/Makefile=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 |=C2=A0 =C2=A01 +
>=C2=A0 mm/mmap_lock.c=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0|=C2=A0 46 +++++
>=C2=A0 13 files changed, 782 insertions(+), 9 deletions(-)
>=C2=A0 create mode 100644 include/linux/histogram.h
>=C2=A0 create mode 100644 include/trace/events/mmap_lock.h
>=C2=A0 create mode 100644 lib/histogram.c
>=C2=A0 create mode 100644 mm/mmap_lock.c
>
> --
> 2.27.0.rc0.183.gde8f92d652-goog

--00000000000026c63e05a6bf857b--