From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 98121C433EF for ; Fri, 25 Mar 2022 19:54:57 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D40DD6B0071; Fri, 25 Mar 2022 15:54:56 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id CC8976B0073; Fri, 25 Mar 2022 15:54:56 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B69258D0001; Fri, 25 Mar 2022 15:54:56 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0249.hostedemail.com [216.40.44.249]) by kanga.kvack.org (Postfix) with ESMTP id A34546B0071 for ; Fri, 25 Mar 2022 15:54:56 -0400 (EDT) Received: from smtpin25.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 462041828AC87 for ; Fri, 25 Mar 2022 19:54:56 +0000 (UTC) X-FDA: 79283961792.25.1192B99 Received: from mail-pf1-f181.google.com (mail-pf1-f181.google.com [209.85.210.181]) by imf10.hostedemail.com (Postfix) with ESMTP id AA697C002A for ; Fri, 25 Mar 2022 19:54:55 +0000 (UTC) Received: by mail-pf1-f181.google.com with SMTP id z16so7290934pfh.3 for ; Fri, 25 Mar 2022 12:54:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=ZaDRXF5HtiCt0/oW6cPh0VHdaRkOGpTFMLsdRtGbXcY=; b=hOXejXLXggJ7CNj3csKDZY/k+PkbTcaqF5QC6wIBEb6o3iL3QBeLDqqeurinT+VgO1 p7eqnXMjV5T6eTZvyDXtngsMTSsGd5W/o8nBmHUd+h6/Kj/dVFUvowZRwBRPoSNoOGBH 5L6GMM3fJOjtY+4wgOtux5RUArNw1mYo8yYbgEnv/CXzDmepbMXQ/ztj+prE3d0bPXul zIPilInvfKdF5FxhSgtrZ46p1ac1uA7AB11QA8jpJ5QnaFZ+Zt/mKi1tGNZ2mKjiDT1Z CwSgSBDs/jksnyBAjQxnnY2/i8J7babVmSLtV5cjXEYqDfj8XxZSP5DoJNqC/bPczUUR wNog== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=ZaDRXF5HtiCt0/oW6cPh0VHdaRkOGpTFMLsdRtGbXcY=; b=AYXewPZOKmkarc5fBa8M2EfzcejouSE8udlitkchNvWCdX9eiMomi85BNAnsMuN4A/ 1w3roI/AR1iLx0LFpcpTZUp6uwN/u6LFF575NMsHn7kiGilzN5kwXUgKP4za7wY3l0PJ a7sFq2MctfzJCefE7I6hsBe2/8rZuOKF76rVYS4DaajHGWyYk+aYMQIvIHPhf2A848yo YQmgigsURQWSOAU7rRbCYstyzHfpFzKlVnAImjg2JBNYkqI/jRk9RULNwQiMVne7dbOD EoLYCxsAYDYcLKW9pcAo1jC1+7kY2RyNcpCHK4EokEuQ12086Y8XFZQWPCfYz4CXrDUJ AKjg== X-Gm-Message-State: AOAM5318qqfjgsCP3MljOfagN6bBlc0laczVpJYhoFl5Vx6DsbaONOIh 8K8nZsK4sg9Ir3vvFnctq748rvJK65n67BVYcUk= X-Google-Smtp-Source: ABdhPJxueuRQtNePlgUaVZoD+P9GLHjdURPrITSITBHL+DeNWcaTcbkBPFdRS9FwsT1NGd5CuZSyOMWqdK3J7tuhvzE= X-Received: by 2002:a05:6a00:2290:b0:4fa:a99e:2e21 with SMTP id f16-20020a056a00229000b004faa99e2e21mr11570814pfe.20.1648238094368; Fri, 25 Mar 2022 12:54:54 -0700 (PDT) MIME-Version: 1.0 References: <20220308213417.1407042-1-zokeefe@google.com> <20220308213417.1407042-13-zokeefe@google.com> In-Reply-To: From: Yang Shi Date: Fri, 25 Mar 2022 12:54:42 -0700 Message-ID: Subject: Re: [RFC PATCH 12/14] mm/madvise: introduce batched madvise(MADV_COLLPASE) collapse To: "Zach O'Keefe" Cc: Matthew Wilcox , David Rientjes , Alex Shi , David Hildenbrand , Michal Hocko , Pasha Tatashin , SeongJae Park , Song Liu , Vlastimil Babka , Zi Yan , Linux MM , Andrea Arcangeli , Andrew Morton , Arnd Bergmann , Axel Rasmussen , Chris Kennelly , Chris Zankel , Helge Deller , Hugh Dickins , Ivan Kokshaysky , "James E.J. Bottomley" , Jens Axboe , "Kirill A. Shutemov" , Matt Turner , Max Filippov , Miaohe Lin , Minchan Kim , Patrick Xia , Pavel Begunkov , Peter Xu , Thomas Bogendoerfer Content-Type: text/plain; charset="UTF-8" X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: AA697C002A X-Rspam-User: Authentication-Results: imf10.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=hOXejXLX; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf10.hostedemail.com: domain of shy828301@gmail.com designates 209.85.210.181 as permitted sender) smtp.mailfrom=shy828301@gmail.com X-Stat-Signature: zkc3gt744itujidjso91w6t8iqusgewc X-HE-Tag: 1648238095-319584 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Fri, Mar 25, 2022 at 9:51 AM Zach O'Keefe wrote: > > Hey All, > > Sorry for the delay. So, I ran some synthetic tests on a dual socket > Skylake with configured batch sizes of 1, 8, 32, and 64. Basic setup > was: 1 thread continuously madvise(MADV_COLLAPSE)'ing memory, 20 > threads continuously faulting-in pages, and some basic synchronization > so that all threads follow a "only do work when all other threads have > work to do" model (i.e. so we don't measure faults in the absence of > simultaneous collapses, or vice versa). I used bpftrace attached to > tracepoint:mmap_lock to measure r/w mmap_lock contention over 20 > minutes. > > Assuming we want to optimize for fault-path readers, the results are > pretty clear: BATCH-1 outperforms BATCH-8, BATCH-32, and BATCH-64 by > 254%, 381%, and 425% respectively, in terms of mean time for > fault-threads to acquire mmap_lock in read, while also having less > tail latency (didn't calculate, just looked at bpftrace histograms). > If we cared at all about madvise(MADV_COLLAPSE) performance, then > BATCH-1 is 83-86% as fast as the others and holds mmap_lock in write > for about the same amount of time in aggregate (~0 +/- 2%). > > I've included the bpftrace histograms for fault-threads acquiring > mmap_lock in read at the end for posterity, and can provide more data > / info if folks are interested. > > In light of these results, I'll rework the code to iteratively operate > on single hugepages, which should have the added benefit of > considerably simplifying the code for an eminent V1 series. Thanks for the data. Yeah, I agree this is the best tradeoff. > > Thanks, > Zach > > bpftrace data: > > /*****************************************************************************/ > batch size: 1 > > @mmap_lock_r_acquire[fault-thread]: > [128, 256) 1254 | | > [256, 512) 2691261 |@@@@@@@@@@@@@@@@@ | > [512, 1K) 2969500 |@@@@@@@@@@@@@@@@@@@ | > [1K, 2K) 1794738 |@@@@@@@@@@@ | > [2K, 4K) 1590984 |@@@@@@@@@@ | > [4K, 8K) 3273349 |@@@@@@@@@@@@@@@@@@@@@ | > [8K, 16K) 851467 |@@@@@ | > [16K, 32K) 460653 |@@ | > [32K, 64K) 7274 | | > [64K, 128K) 25 | | > [128K, 256K) 0 | | > [256K, 512K) 0 | | > [512K, 1M) 8085437 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@| > [1M, 2M) 381735 |@@ | > [2M, 4M) 28 | | > > @mmap_lock_r_acquire_stat[fault-thread]: count 22107705, average > 326480, total 7217743234867 > > /*****************************************************************************/ > batch size: 8 > > @mmap_lock_r_acquire[fault-thread]: > [128, 256) 55 | | > [256, 512) 247028 |@@@@@@ | > [512, 1K) 239083 |@@@@@@ | > [1K, 2K) 142296 |@@@ | > [2K, 4K) 153149 |@@@@ | > [4K, 8K) 1899396 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@| > [8K, 16K) 1780734 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ | > [16K, 32K) 95645 |@@ | > [32K, 64K) 1933 | | > [64K, 128K) 3 | | > [128K, 256K) 0 | | > [256K, 512K) 0 | | > [512K, 1M) 0 | | > [1M, 2M) 0 | | > [2M, 4M) 0 | | > [4M, 8M) 1132899 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ | > [8M, 16M) 3953 | | > > @mmap_lock_r_acquire_stat[fault-thread]: count 5696174, average > 1156055, total 6585091744973 > > /*****************************************************************************/ > batch size: 32 > > @mmap_lock_r_acquire[fault-thread]: > [128, 256) 35 | | > [256, 512) 63413 |@ | > [512, 1K) 78130 |@ | > [1K, 2K) 39548 | | > [2K, 4K) 44331 | | > [4K, 8K) 2398751 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@| > [8K, 16K) 1316932 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@ | > [16K, 32K) 54798 |@ | > [32K, 64K) 771 | | > [64K, 128K) 2 | | > [128K, 256K) 0 | | > [256K, 512K) 0 | | > [512K, 1M) 0 | | > [1M, 2M) 0 | | > [2M, 4M) 0 | | > [4M, 8M) 0 | | > [8M, 16M) 0 | | > [16M, 32M) 280791 |@@@@@@ | > [32M, 64M) 809 | | > > @mmap_lock_r_acquire_stat[fault-thread]: count 4278311, average > 1571585, total 6723733081824 > > /*****************************************************************************/ > batch size: 64 > > @mmap_lock_r_acquire[fault-thread]: > [256, 512) 30303 | | > [512, 1K) 42366 |@ | > [1K, 2K) 23679 | | > [2K, 4K) 22781 | | > [4K, 8K) 1637566 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ | > [8K, 16K) 1955773 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@| > [16K, 32K) 41832 |@ | > [32K, 64K) 563 | | > [64K, 128K) 0 | | > [128K, 256K) 0 | | > [256K, 512K) 0 | | > [512K, 1M) 0 | | > [1M, 2M) 0 | | > [2M, 4M) 0 | | > [4M, 8M) 0 | | > [8M, 16M) 0 | | > [16M, 32M) 0 | | > [32M, 64M) 140723 |@@@ | > [64M, 128M) 77 | | > > @mmap_lock_r_acquire_stat[fault-thread]: count 3895663, average > 1715797, total 6684170171691 > > On Thu, Mar 10, 2022 at 4:06 PM Zach O'Keefe wrote: > > > > On Thu, Mar 10, 2022 at 12:17 PM Matthew Wilcox wrote: > > > > > > On Thu, Mar 10, 2022 at 11:26:15AM -0800, David Rientjes wrote: > > > > One concern might be the queueing of read locks needed for page faults > > > > behind a collapser of a long range of memory that is otherwise looping > > > > and repeatedly taking the write lock. > > > > > > I would have thought that _not_ batching would improve this situation. > > > Unless our implementation of rwsems has changed since the last time I > > > looked, dropping-and-reacquiring a rwsem while there are pending readers > > > means you go to the end of the line and they all get to handle their > > > page faults. > > > > > > > Hey Matthew, thanks for the review / feedback. > > > > I don't have great intuition here, so I'll try to put together a > > simple synthetic test to get some data. Though the code would be > > different, I can functionally approximate a non-batched approach with > > a batch size of 1, and compare that against N. > > > > My file-backed patches likewise weren't able to take advantage of > > batching outside mmap lock contention, so the data should equally > > apply there.