From: "Zach O'Keefe" <zokeefe@google.com>
To: Matthew Wilcox <willy@infradead.org>
Cc: David Rientjes <rientjes@google.com>,
Yang Shi <shy828301@gmail.com>,
Alex Shi <alex.shi@linux.alibaba.com>,
David Hildenbrand <david@redhat.com>,
Michal Hocko <mhocko@suse.com>,
Pasha Tatashin <pasha.tatashin@soleen.com>,
SeongJae Park <sj@kernel.org>, Song Liu <songliubraving@fb.com>,
Vlastimil Babka <vbabka@suse.cz>, Zi Yan <ziy@nvidia.com>,
Linux MM <linux-mm@kvack.org>,
Andrea Arcangeli <aarcange@redhat.com>,
Andrew Morton <akpm@linux-foundation.org>,
Arnd Bergmann <arnd@arndb.de>,
Axel Rasmussen <axelrasmussen@google.com>,
Chris Kennelly <ckennelly@google.com>,
Chris Zankel <chris@zankel.net>, Helge Deller <deller@gmx.de>,
Hugh Dickins <hughd@google.com>,
Ivan Kokshaysky <ink@jurassic.park.msu.ru>,
"James E.J. Bottomley" <James.Bottomley@hansenpartnership.com>,
Jens Axboe <axboe@kernel.dk>,
"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
Matt Turner <mattst88@gmail.com>,
Max Filippov <jcmvbkbc@gmail.com>,
Miaohe Lin <linmiaohe@huawei.com>,
Minchan Kim <minchan@kernel.org>,
Patrick Xia <patrickx@google.com>,
Pavel Begunkov <asml.silence@gmail.com>,
Peter Xu <peterx@redhat.com>,
Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Subject: Re: [RFC PATCH 12/14] mm/madvise: introduce batched madvise(MADV_COLLPASE) collapse
Date: Fri, 25 Mar 2022 09:51:10 -0700 [thread overview]
Message-ID: <CAAa6QmRc76n-dspGT7UK8DkaqZAOz-CkCsME1V7KGtQ6Yt2FqA@mail.gmail.com> (raw)
In-Reply-To: <CAAa6QmQm8b8w6X98pc-MV+wMwKyQbXjMOjcxZS_C2Yh-WoiPag@mail.gmail.com>
Hey All,
Sorry for the delay. So, I ran some synthetic tests on a dual socket
Skylake with configured batch sizes of 1, 8, 32, and 64. Basic setup
was: 1 thread continuously madvise(MADV_COLLAPSE)'ing memory, 20
threads continuously faulting-in pages, and some basic synchronization
so that all threads follow a "only do work when all other threads have
work to do" model (i.e. so we don't measure faults in the absence of
simultaneous collapses, or vice versa). I used bpftrace attached to
tracepoint:mmap_lock to measure r/w mmap_lock contention over 20
minutes.
Assuming we want to optimize for fault-path readers, the results are
pretty clear: BATCH-1 outperforms BATCH-8, BATCH-32, and BATCH-64 by
254%, 381%, and 425% respectively, in terms of mean time for
fault-threads to acquire mmap_lock in read, while also having less
tail latency (didn't calculate, just looked at bpftrace histograms).
If we cared at all about madvise(MADV_COLLAPSE) performance, then
BATCH-1 is 83-86% as fast as the others and holds mmap_lock in write
for about the same amount of time in aggregate (~0 +/- 2%).
I've included the bpftrace histograms for fault-threads acquiring
mmap_lock in read at the end for posterity, and can provide more data
/ info if folks are interested.
In light of these results, I'll rework the code to iteratively operate
on single hugepages, which should have the added benefit of
considerably simplifying the code for an eminent V1 series.
Thanks,
Zach
bpftrace data:
/*****************************************************************************/
batch size: 1
@mmap_lock_r_acquire[fault-thread]:
[128, 256) 1254 | |
[256, 512) 2691261 |@@@@@@@@@@@@@@@@@ |
[512, 1K) 2969500 |@@@@@@@@@@@@@@@@@@@ |
[1K, 2K) 1794738 |@@@@@@@@@@@ |
[2K, 4K) 1590984 |@@@@@@@@@@ |
[4K, 8K) 3273349 |@@@@@@@@@@@@@@@@@@@@@ |
[8K, 16K) 851467 |@@@@@ |
[16K, 32K) 460653 |@@ |
[32K, 64K) 7274 | |
[64K, 128K) 25 | |
[128K, 256K) 0 | |
[256K, 512K) 0 | |
[512K, 1M) 8085437 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
[1M, 2M) 381735 |@@ |
[2M, 4M) 28 | |
@mmap_lock_r_acquire_stat[fault-thread]: count 22107705, average
326480, total 7217743234867
/*****************************************************************************/
batch size: 8
@mmap_lock_r_acquire[fault-thread]:
[128, 256) 55 | |
[256, 512) 247028 |@@@@@@ |
[512, 1K) 239083 |@@@@@@ |
[1K, 2K) 142296 |@@@ |
[2K, 4K) 153149 |@@@@ |
[4K, 8K) 1899396 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
[8K, 16K) 1780734 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ |
[16K, 32K) 95645 |@@ |
[32K, 64K) 1933 | |
[64K, 128K) 3 | |
[128K, 256K) 0 | |
[256K, 512K) 0 | |
[512K, 1M) 0 | |
[1M, 2M) 0 | |
[2M, 4M) 0 | |
[4M, 8M) 1132899 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ |
[8M, 16M) 3953 | |
@mmap_lock_r_acquire_stat[fault-thread]: count 5696174, average
1156055, total 6585091744973
/*****************************************************************************/
batch size: 32
@mmap_lock_r_acquire[fault-thread]:
[128, 256) 35 | |
[256, 512) 63413 |@ |
[512, 1K) 78130 |@ |
[1K, 2K) 39548 | |
[2K, 4K) 44331 | |
[4K, 8K) 2398751 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
[8K, 16K) 1316932 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@ |
[16K, 32K) 54798 |@ |
[32K, 64K) 771 | |
[64K, 128K) 2 | |
[128K, 256K) 0 | |
[256K, 512K) 0 | |
[512K, 1M) 0 | |
[1M, 2M) 0 | |
[2M, 4M) 0 | |
[4M, 8M) 0 | |
[8M, 16M) 0 | |
[16M, 32M) 280791 |@@@@@@ |
[32M, 64M) 809 | |
@mmap_lock_r_acquire_stat[fault-thread]: count 4278311, average
1571585, total 6723733081824
/*****************************************************************************/
batch size: 64
@mmap_lock_r_acquire[fault-thread]:
[256, 512) 30303 | |
[512, 1K) 42366 |@ |
[1K, 2K) 23679 | |
[2K, 4K) 22781 | |
[4K, 8K) 1637566 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ |
[8K, 16K) 1955773 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
[16K, 32K) 41832 |@ |
[32K, 64K) 563 | |
[64K, 128K) 0 | |
[128K, 256K) 0 | |
[256K, 512K) 0 | |
[512K, 1M) 0 | |
[1M, 2M) 0 | |
[2M, 4M) 0 | |
[4M, 8M) 0 | |
[8M, 16M) 0 | |
[16M, 32M) 0 | |
[32M, 64M) 140723 |@@@ |
[64M, 128M) 77 | |
@mmap_lock_r_acquire_stat[fault-thread]: count 3895663, average
1715797, total 6684170171691
On Thu, Mar 10, 2022 at 4:06 PM Zach O'Keefe <zokeefe@google.com> wrote:
>
> On Thu, Mar 10, 2022 at 12:17 PM Matthew Wilcox <willy@infradead.org> wrote:
> >
> > On Thu, Mar 10, 2022 at 11:26:15AM -0800, David Rientjes wrote:
> > > One concern might be the queueing of read locks needed for page faults
> > > behind a collapser of a long range of memory that is otherwise looping
> > > and repeatedly taking the write lock.
> >
> > I would have thought that _not_ batching would improve this situation.
> > Unless our implementation of rwsems has changed since the last time I
> > looked, dropping-and-reacquiring a rwsem while there are pending readers
> > means you go to the end of the line and they all get to handle their
> > page faults.
> >
>
> Hey Matthew, thanks for the review / feedback.
>
> I don't have great intuition here, so I'll try to put together a
> simple synthetic test to get some data. Though the code would be
> different, I can functionally approximate a non-batched approach with
> a batch size of 1, and compare that against N.
>
> My file-backed patches likewise weren't able to take advantage of
> batching outside mmap lock contention, so the data should equally
> apply there.
next prev parent reply other threads:[~2022-03-25 16:51 UTC|newest]
Thread overview: 57+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-03-08 21:34 [RFC PATCH 00/14] mm: userspace hugepage collapse Zach O'Keefe
2022-03-08 21:34 ` [RFC PATCH 01/14] mm/rmap: add mm_find_pmd_raw helper Zach O'Keefe
2022-03-09 22:48 ` Yang Shi
2022-03-08 21:34 ` [RFC PATCH 02/14] mm/khugepaged: add struct collapse_control Zach O'Keefe
2022-03-09 22:53 ` Yang Shi
2022-03-08 21:34 ` [RFC PATCH 03/14] mm/khugepaged: add __do_collapse_huge_page() helper Zach O'Keefe
2022-03-08 21:34 ` [RFC PATCH 04/14] mm/khugepaged: separate khugepaged_scan_pmd() scan and collapse Zach O'Keefe
2022-03-08 21:34 ` [RFC PATCH 05/14] mm/khugepaged: add mmap_assert_locked() checks to scan_pmd() Zach O'Keefe
2022-03-08 21:34 ` [RFC PATCH 06/14] mm/khugepaged: add hugepage_vma_revalidate_pmd_count() Zach O'Keefe
2022-03-09 23:15 ` Yang Shi
2022-03-08 21:34 ` [RFC PATCH 07/14] mm/khugepaged: add vm_flags_ignore to hugepage_vma_revalidate_pmd_count() Zach O'Keefe
2022-03-09 23:17 ` Yang Shi
2022-03-10 0:00 ` Zach O'Keefe
2022-03-10 0:41 ` Yang Shi
2022-03-10 1:09 ` Zach O'Keefe
2022-03-10 2:16 ` Yang Shi
2022-03-10 15:50 ` Zach O'Keefe
2022-03-10 18:17 ` Yang Shi
2022-03-10 18:46 ` David Rientjes
2022-03-10 18:58 ` Zach O'Keefe
2022-03-10 19:54 ` Yang Shi
2022-03-10 20:24 ` Zach O'Keefe
2022-03-10 18:53 ` Zach O'Keefe
2022-03-10 15:56 ` David Hildenbrand
2022-03-10 18:39 ` Zach O'Keefe
2022-03-10 18:54 ` David Rientjes
2022-03-21 14:27 ` Michal Hocko
2022-03-08 21:34 ` [RFC PATCH 08/14] mm/thp: add madv_thp_vm_flags to __transparent_hugepage_enabled() Zach O'Keefe
2022-03-08 21:34 ` [RFC PATCH 09/14] mm/khugepaged: record SCAN_PAGE_COMPOUND when scan_pmd() finds THP Zach O'Keefe
2022-03-09 23:40 ` Yang Shi
2022-03-10 0:46 ` Zach O'Keefe
2022-03-10 2:05 ` Yang Shi
2022-03-10 8:37 ` Zach O'Keefe
2022-03-08 21:34 ` [RFC PATCH 10/14] mm/khugepaged: rename khugepaged-specific/not functions Zach O'Keefe
2022-03-08 21:34 ` [RFC PATCH 11/14] mm/madvise: introduce MADV_COLLAPSE sync hugepage collapse Zach O'Keefe
2022-03-09 23:43 ` Yang Shi
2022-03-10 1:11 ` Zach O'Keefe
2022-03-08 21:34 ` [RFC PATCH 12/14] mm/madvise: introduce batched madvise(MADV_COLLPASE) collapse Zach O'Keefe
2022-03-10 0:06 ` Yang Shi
2022-03-10 19:26 ` David Rientjes
2022-03-10 20:16 ` Matthew Wilcox
2022-03-11 0:06 ` Zach O'Keefe
2022-03-25 16:51 ` Zach O'Keefe [this message]
2022-03-25 19:54 ` Yang Shi
2022-03-08 21:34 ` [RFC PATCH 13/14] mm/madvise: add __madvise_collapse_*_batch() actions Zach O'Keefe
2022-03-08 21:34 ` [RFC PATCH 14/14] mm/madvise: add process_madvise(MADV_COLLAPSE) Zach O'Keefe
2022-03-21 14:32 ` [RFC PATCH 00/14] mm: userspace hugepage collapse Zi Yan
2022-03-21 14:51 ` Zach O'Keefe
2022-03-21 14:37 ` Michal Hocko
2022-03-21 15:46 ` Zach O'Keefe
2022-03-22 12:11 ` Michal Hocko
2022-03-22 15:53 ` Zach O'Keefe
2022-03-29 12:24 ` Michal Hocko
2022-03-30 0:36 ` Zach O'Keefe
2022-03-22 6:40 ` Zach O'Keefe
2022-03-22 12:05 ` Michal Hocko
2022-03-23 13:30 ` Zach O'Keefe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAAa6QmRc76n-dspGT7UK8DkaqZAOz-CkCsME1V7KGtQ6Yt2FqA@mail.gmail.com \
--to=zokeefe@google.com \
--cc=James.Bottomley@hansenpartnership.com \
--cc=aarcange@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=alex.shi@linux.alibaba.com \
--cc=arnd@arndb.de \
--cc=asml.silence@gmail.com \
--cc=axboe@kernel.dk \
--cc=axelrasmussen@google.com \
--cc=chris@zankel.net \
--cc=ckennelly@google.com \
--cc=david@redhat.com \
--cc=deller@gmx.de \
--cc=hughd@google.com \
--cc=ink@jurassic.park.msu.ru \
--cc=jcmvbkbc@gmail.com \
--cc=kirill.shutemov@linux.intel.com \
--cc=linmiaohe@huawei.com \
--cc=linux-mm@kvack.org \
--cc=mattst88@gmail.com \
--cc=mhocko@suse.com \
--cc=minchan@kernel.org \
--cc=pasha.tatashin@soleen.com \
--cc=patrickx@google.com \
--cc=peterx@redhat.com \
--cc=rientjes@google.com \
--cc=shy828301@gmail.com \
--cc=sj@kernel.org \
--cc=songliubraving@fb.com \
--cc=tsbogend@alpha.franken.de \
--cc=vbabka@suse.cz \
--cc=willy@infradead.org \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox