From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <owner-linux-mm@kvack.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by smtp.lore.kernel.org (Postfix) with ESMTP id C211AC433F5
	for <linux-mm@archiver.kernel.org>; Fri, 25 Mar 2022 16:51:50 +0000 (UTC)
Received: by kanga.kvack.org (Postfix)
	id 162116B0071; Fri, 25 Mar 2022 12:51:50 -0400 (EDT)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id 1117C6B0073; Fri, 25 Mar 2022 12:51:50 -0400 (EDT)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id F1A508D0001; Fri, 25 Mar 2022 12:51:49 -0400 (EDT)
X-Delivered-To: linux-mm@kvack.org
Received: from forelay.hostedemail.com (smtprelay0043.hostedemail.com [216.40.44.43])
	by kanga.kvack.org (Postfix) with ESMTP id DE67B6B0071
	for <linux-mm@kvack.org>; Fri, 25 Mar 2022 12:51:49 -0400 (EDT)
Received: from smtpin23.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251])
	by forelay05.hostedemail.com (Postfix) with ESMTP id 94159182976E6
	for <linux-mm@kvack.org>; Fri, 25 Mar 2022 16:51:49 +0000 (UTC)
X-FDA: 79283500338.23.DD0E4F6
Received: from mail-lf1-f45.google.com (mail-lf1-f45.google.com [209.85.167.45])
	by imf11.hostedemail.com (Postfix) with ESMTP id 1670640016
	for <linux-mm@kvack.org>; Fri, 25 Mar 2022 16:51:48 +0000 (UTC)
Received: by mail-lf1-f45.google.com with SMTP id w7so14362848lfd.6
        for <linux-mm@kvack.org>; Fri, 25 Mar 2022 09:51:48 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20210112;
        h=mime-version:references:in-reply-to:from:date:message-id:subject:to
         :cc;
        bh=L3OUpu7z6vS7OQYypbnDwlWJucGfS8n/nQpwCYTnNNE=;
        b=qBHmzZJSb7Vihsgr+wUv8eZWP4XZoDNAG5NVIgsnAwr6CBY8BufQxVOk1j9LxTVKRk
         JNWi1Zy8S70gJZWllzsIh0LthX7OpQ2Dk49NfWNtJk+fbUTJ36PYrKs5UbcPYYUmNbsz
         x3E0/YuQbitnXJy/nml+GoHX02iyb/+wUjxjtWh/bcjWy0+56xlBQFuX6cmPDM2IfO8I
         qbHvB5MqawtTJkjlELQsqAAb9Wy1nOrodJVlP4Dd1uI8FZiWnZrnBI4q7DxApFuemxIA
         slB/UshQTTNDgAq+sdBbhpbe+qwu1QxTXmuStKiH31iJs3kksmI3mTY43kuH58CkyPhs
         B0Zw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=x-gm-message-state:mime-version:references:in-reply-to:from:date
         :message-id:subject:to:cc;
        bh=L3OUpu7z6vS7OQYypbnDwlWJucGfS8n/nQpwCYTnNNE=;
        b=EiYUAnvjhNJ3aWt4Jfts+hQzhImlempiV5fhSpEs6yxlLKPKF8i1SgpWhfZf6sgotY
         0r1gx8gcuTQ+O4LCztJF4lVD5GP1kaFeI6NV3am1ZHeeIgPuuZNyfL3pCEhheX8568z9
         2A0W2tCybjEcH6gTNUZZwg+cz/YCq6zAsMgeHMKH3wVl5xEv4i13esvy/Yl9XJADKsi4
         pszsYBSxLvST8wKxU+MxGFn8Hx1Qt9SrLnRswiDva3wPWgMOBzdFvxfTBtCZ2zOxomMK
         Fc/idUP5O9Eze1RZbt+IZxtZ+1/P9uEBJC86sobHQQDnHjUKHlgCNRUmqJMzdM/CscbS
         eHyw==
X-Gm-Message-State: AOAM530GSSRZV823EexOj4KILcgv0Uvr6gKIF3pjeTkx7NImbHF9Ln96
	d2BaBI0AavIVL8OVumyC18E8W5PTTc4oUD2SGD8drg==
X-Google-Smtp-Source: ABdhPJxis+COef7agmP0iQ6LN5McGWSAxdBoqLKtDd7yu6X5c/WTk87m4k8hk4unatm7wEC4Jb8uzsK9KgOvXNMqOCw=
X-Received: by 2002:a05:6512:c24:b0:44a:7434:9836 with SMTP id
 z36-20020a0565120c2400b0044a74349836mr2854851lfu.128.1648227106903; Fri, 25
 Mar 2022 09:51:46 -0700 (PDT)
MIME-Version: 1.0
References: <20220308213417.1407042-1-zokeefe@google.com> <20220308213417.1407042-13-zokeefe@google.com>
 <CAHbLzkqy4V6b+2A3SziTyXMM_thtsQskXBeTDs-zy16Dwkef6Q@mail.gmail.com>
 <e919f497-272c-87c-c981-c6c38dcae7fc@google.com> <YipcsMAgdCrnC5Ip@casper.infradead.org>
 <CAAa6QmQm8b8w6X98pc-MV+wMwKyQbXjMOjcxZS_C2Yh-WoiPag@mail.gmail.com>
In-Reply-To: <CAAa6QmQm8b8w6X98pc-MV+wMwKyQbXjMOjcxZS_C2Yh-WoiPag@mail.gmail.com>
From: "Zach O'Keefe" <zokeefe@google.com>
Date: Fri, 25 Mar 2022 09:51:10 -0700
Message-ID: <CAAa6QmRc76n-dspGT7UK8DkaqZAOz-CkCsME1V7KGtQ6Yt2FqA@mail.gmail.com>
Subject: Re: [RFC PATCH 12/14] mm/madvise: introduce batched
 madvise(MADV_COLLPASE) collapse
To: Matthew Wilcox <willy@infradead.org>
Cc: David Rientjes <rientjes@google.com>, Yang Shi <shy828301@gmail.com>, 
	Alex Shi <alex.shi@linux.alibaba.com>, David Hildenbrand <david@redhat.com>, 
	Michal Hocko <mhocko@suse.com>, Pasha Tatashin <pasha.tatashin@soleen.com>, 
	SeongJae Park <sj@kernel.org>, Song Liu <songliubraving@fb.com>, Vlastimil Babka <vbabka@suse.cz>, 
	Zi Yan <ziy@nvidia.com>, Linux MM <linux-mm@kvack.org>, 
	Andrea Arcangeli <aarcange@redhat.com>, Andrew Morton <akpm@linux-foundation.org>, 
	Arnd Bergmann <arnd@arndb.de>, Axel Rasmussen <axelrasmussen@google.com>, 
	Chris Kennelly <ckennelly@google.com>, Chris Zankel <chris@zankel.net>, Helge Deller <deller@gmx.de>, 
	Hugh Dickins <hughd@google.com>, Ivan Kokshaysky <ink@jurassic.park.msu.ru>, 
	"James E.J. Bottomley" <James.Bottomley@hansenpartnership.com>, Jens Axboe <axboe@kernel.dk>, 
	"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>, Matt Turner <mattst88@gmail.com>, 
	Max Filippov <jcmvbkbc@gmail.com>, Miaohe Lin <linmiaohe@huawei.com>, 
	Minchan Kim <minchan@kernel.org>, Patrick Xia <patrickx@google.com>, 
	Pavel Begunkov <asml.silence@gmail.com>, Peter Xu <peterx@redhat.com>, 
	Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Content-Type: text/plain; charset="UTF-8"
Authentication-Results: imf11.hostedemail.com;
	dkim=pass header.d=google.com header.s=20210112 header.b=qBHmzZJS;
	spf=pass (imf11.hostedemail.com: domain of zokeefe@google.com designates 209.85.167.45 as permitted sender) smtp.mailfrom=zokeefe@google.com;
	dmarc=pass (policy=reject) header.from=google.com
X-Rspam-User: 
X-Rspamd-Server: rspam10
X-Rspamd-Queue-Id: 1670640016
X-Stat-Signature: udjm4tebjek73e5om6ne1pwnwskbcuu4
X-HE-Tag: 1648227108-417419
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>

Hey All,

Sorry for the delay. So, I ran some synthetic tests on a dual socket
Skylake with configured batch sizes of 1, 8, 32, and 64. Basic setup
was: 1 thread continuously madvise(MADV_COLLAPSE)'ing memory, 20
threads continuously faulting-in pages, and some basic synchronization
so that all threads follow a "only do work when all other threads have
work to do" model (i.e. so we don't measure faults in the absence of
simultaneous collapses, or vice versa). I used bpftrace attached to
tracepoint:mmap_lock to measure r/w mmap_lock contention over 20
minutes.

Assuming we want to optimize for fault-path readers, the results are
pretty clear: BATCH-1 outperforms BATCH-8, BATCH-32, and BATCH-64 by
254%, 381%, and 425% respectively, in terms of mean time for
fault-threads to acquire mmap_lock in read, while also having less
tail latency (didn't calculate, just looked at bpftrace histograms).
If we cared at all about madvise(MADV_COLLAPSE) performance, then
BATCH-1 is 83-86% as fast as the others and holds mmap_lock in write
for about the same amount of time in aggregate (~0 +/- 2%).

I've included the bpftrace histograms for fault-threads acquiring
mmap_lock in read at the end for posterity, and can provide more data
/ info if folks are interested.

In light of these results, I'll rework the code to iteratively operate
on single hugepages, which should have the added benefit of
considerably simplifying the code for an eminent V1 series.

Thanks,
Zach

bpftrace data:

/*****************************************************************************/
batch size: 1

@mmap_lock_r_acquire[fault-thread]:
[128, 256)          1254 |                                                    |
[256, 512)       2691261 |@@@@@@@@@@@@@@@@@                                   |
[512, 1K)        2969500 |@@@@@@@@@@@@@@@@@@@                                 |
[1K, 2K)         1794738 |@@@@@@@@@@@                                         |
[2K, 4K)         1590984 |@@@@@@@@@@                                          |
[4K, 8K)         3273349 |@@@@@@@@@@@@@@@@@@@@@                               |
[8K, 16K)         851467 |@@@@@                                               |
[16K, 32K)        460653 |@@                                                  |
[32K, 64K)          7274 |                                                    |
[64K, 128K)           25 |                                                    |
[128K, 256K)           0 |                                                    |
[256K, 512K)           0 |                                                    |
[512K, 1M)       8085437 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
[1M, 2M)          381735 |@@                                                  |
[2M, 4M)              28 |                                                    |

@mmap_lock_r_acquire_stat[fault-thread]: count 22107705, average
326480, total 7217743234867

/*****************************************************************************/
batch size: 8

@mmap_lock_r_acquire[fault-thread]:
[128, 256)            55 |                                                    |
[256, 512)        247028 |@@@@@@                                              |
[512, 1K)         239083 |@@@@@@                                              |
[1K, 2K)          142296 |@@@                                                 |
[2K, 4K)          153149 |@@@@                                                |
[4K, 8K)         1899396 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
[8K, 16K)        1780734 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@    |
[16K, 32K)         95645 |@@                                                  |
[32K, 64K)          1933 |                                                    |
[64K, 128K)            3 |                                                    |
[128K, 256K)           0 |                                                    |
[256K, 512K)           0 |                                                    |
[512K, 1M)             0 |                                                    |
[1M, 2M)               0 |                                                    |
[2M, 4M)               0 |                                                    |
[4M, 8M)         1132899 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@                     |
[8M, 16M)           3953 |                                                    |

@mmap_lock_r_acquire_stat[fault-thread]: count 5696174, average
1156055, total 6585091744973

/*****************************************************************************/
batch size: 32

@mmap_lock_r_acquire[fault-thread]:
[128, 256)            35 |                                                    |
[256, 512)         63413 |@                                                   |
[512, 1K)          78130 |@                                                   |
[1K, 2K)           39548 |                                                    |
[2K, 4K)           44331 |                                                    |
[4K, 8K)         2398751 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
[8K, 16K)        1316932 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@                        |
[16K, 32K)         54798 |@                                                   |
[32K, 64K)           771 |                                                    |
[64K, 128K)            2 |                                                    |
[128K, 256K)           0 |                                                    |
[256K, 512K)           0 |                                                    |
[512K, 1M)             0 |                                                    |
[1M, 2M)               0 |                                                    |
[2M, 4M)               0 |                                                    |
[4M, 8M)               0 |                                                    |
[8M, 16M)              0 |                                                    |
[16M, 32M)        280791 |@@@@@@                                              |
[32M, 64M)           809 |                                                    |

@mmap_lock_r_acquire_stat[fault-thread]: count 4278311, average
1571585, total 6723733081824

/*****************************************************************************/
batch size: 64

@mmap_lock_r_acquire[fault-thread]:
[256, 512)         30303 |                                                    |
[512, 1K)          42366 |@                                                   |
[1K, 2K)           23679 |                                                    |
[2K, 4K)           22781 |                                                    |
[4K, 8K)         1637566 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@         |
[8K, 16K)        1955773 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
[16K, 32K)         41832 |@                                                   |
[32K, 64K)           563 |                                                    |
[64K, 128K)            0 |                                                    |
[128K, 256K)           0 |                                                    |
[256K, 512K)           0 |                                                    |
[512K, 1M)             0 |                                                    |
[1M, 2M)               0 |                                                    |
[2M, 4M)               0 |                                                    |
[4M, 8M)               0 |                                                    |
[8M, 16M)              0 |                                                    |
[16M, 32M)             0 |                                                    |
[32M, 64M)        140723 |@@@                                                 |
[64M, 128M)           77 |                                                    |

@mmap_lock_r_acquire_stat[fault-thread]: count 3895663, average
1715797, total 6684170171691

On Thu, Mar 10, 2022 at 4:06 PM Zach O'Keefe <zokeefe@google.com> wrote:
>
> On Thu, Mar 10, 2022 at 12:17 PM Matthew Wilcox <willy@infradead.org> wrote:
> >
> > On Thu, Mar 10, 2022 at 11:26:15AM -0800, David Rientjes wrote:
> > > One concern might be the queueing of read locks needed for page faults
> > > behind a collapser of a long range of memory that is otherwise looping
> > > and repeatedly taking the write lock.
> >
> > I would have thought that _not_ batching would improve this situation.
> > Unless our implementation of rwsems has changed since the last time I
> > looked, dropping-and-reacquiring a rwsem while there are pending readers
> > means you go to the end of the line and they all get to handle their
> > page faults.
> >
>
> Hey Matthew, thanks for the review / feedback.
>
> I don't have great intuition here, so I'll try to put together a
> simple synthetic test to get some data. Though the code would be
> different, I can functionally approximate a non-batched approach with
> a batch size of 1, and compare that against N.
>
> My file-backed patches likewise weren't able to take advantage of
> batching outside mmap lock contention, so the data should equally
> apply there.