From: Axel Rasmussen <axelrasmussen@google.com>
To: Peter Xu <peterx@redhat.com>
Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org,
Anish Moorthy <amoorthy@google.com>,
Alexander Viro <viro@zeniv.linux.org.uk>,
Mike Kravetz <mike.kravetz@oracle.com>,
Peter Zijlstra <peterz@infradead.org>,
Andrew Morton <akpm@linux-foundation.org>,
Mike Rapoport <rppt@kernel.org>,
Christian Brauner <brauner@kernel.org>,
linux-fsdevel@vger.kernel.org,
Andrea Arcangeli <aarcange@redhat.com>,
Ingo Molnar <mingo@redhat.com>,
James Houghton <jthoughton@google.com>,
Nadav Amit <nadav.amit@gmail.com>
Subject: Re: [PATCH 0/7] mm/userfaultfd/poll: Scale userfaultfd wakeups
Date: Thu, 7 Sep 2023 12:18:29 -0700 [thread overview]
Message-ID: <CAJHvVcjQR95KVfu2qv3hepkLWkH5J8qRG_BazHKSXoGoGnUATg@mail.gmail.com> (raw)
In-Reply-To: <20230905214235.320571-1-peterx@redhat.com>
On Tue, Sep 5, 2023 at 2:42 PM Peter Xu <peterx@redhat.com> wrote:
>
> Userfaultfd is the type of file that doesn't need wake-all semantics: if
> there is a message enqueued (for either a fault address, or an event), we
> only need to wake up one service thread to handle it. Waking up more
> normally means a waste of cpu cycles. Besides that, and more importantly,
> that just doesn't scale.
Hi Peter,
I took a quick look over the series and didn't see anything
objectionable. I was planning to actually test the series out and then
send out R-b's, but it will take some additional time (next week).
In the meantime, I was curious about the use case. A design I've seen
for VM live migration is to have 1 thread reading events off the uffd,
and then have many threads actually resolving the fault events that
come in (e.g. fetching pages over the network, issuing UFFDIO_COPY or
UFFDIO_CONTINUE, or whatever). In that design, since we only have a
single reader anyway, I think this series doesn't help.
But, I'm curious if you have data indicating that > 1 reader is more
performant overall? I suspect it might be the case that, with "enough"
vCPUs, it makes sense to do so, but I don't have benchmark data to
tell me what that tipping point is yet.
OTOH, if one reader is plenty in ~all cases, optimizing this path is
less important.
>
> Andrea used to have one patch that made read() to be O(1) but never hit
> upstream. This is my effort to try upstreaming that (which is a
> oneliner..), meanwhile on top of that I also made poll() O(1) on wakeup,
> too (more or less bring EPOLLEXCLUSIVE to poll()), with some tests showing
> that effect.
>
> To verify this, I added a test called uffd-perf (leveraging the refactored
> uffd selftest suite) that will measure the messaging channel latencies on
> wakeups, and the waitqueue optimizations can be reflected by the new test:
>
> Constants: 40 uffd threads, on N_CPUS=40, memsize=512M
> Units: milliseconds (to finish the test)
> |-----------------+--------+-------+------------|
> | test case | before | after | diff (%) |
> |-----------------+--------+-------+------------|
> | workers=8,poll | 1762 | 1133 | -55.516328 |
> | workers=8,read | 1437 | 585 | -145.64103 |
> | workers=16,poll | 1117 | 1097 | -1.8231541 |
> | workers=16,read | 1159 | 759 | -52.700922 |
> | workers=32,poll | 1001 | 973 | -2.8776978 |
> | workers=32,read | 866 | 713 | -21.458626 |
> |-----------------+--------+-------+------------|
>
> The more threads hanging on the fd_wqh, a bigger difference will be there
> shown in the numbers. "8 worker threads" is the worst case here because it
> means there can be a worst case of 40-8=32 threads hanging idle on fd_wqh
> queue.
>
> In real life, workers can be more than this, but small number of active
> worker threads will cause similar effect.
>
> This is currently based on Andrew's mm-unstable branch, but assuming this
> is applicable to most of the not-so-old trees.
>
> Comments welcomed, thanks.
>
> Andrea Arcangeli (1):
> mm/userfaultfd: Make uffd read() wait event exclusive
>
> Peter Xu (6):
> poll: Add a poll_flags for poll_queue_proc()
> poll: POLL_ENQUEUE_EXCLUSIVE
> fs/userfaultfd: Use exclusive waitqueue for poll()
> selftests/mm: Replace uffd_read_mutex with a semaphore
> selftests/mm: Create uffd_fault_thread_create|join()
> selftests/mm: uffd perf test
>
> drivers/vfio/virqfd.c | 4 +-
> drivers/vhost/vhost.c | 2 +-
> drivers/virt/acrn/irqfd.c | 2 +-
> fs/aio.c | 2 +-
> fs/eventpoll.c | 2 +-
> fs/select.c | 9 +-
> fs/userfaultfd.c | 8 +-
> include/linux/poll.h | 25 ++-
> io_uring/poll.c | 4 +-
> mm/memcontrol.c | 4 +-
> net/9p/trans_fd.c | 3 +-
> tools/testing/selftests/mm/Makefile | 2 +
> tools/testing/selftests/mm/uffd-common.c | 65 +++++++
> tools/testing/selftests/mm/uffd-common.h | 7 +
> tools/testing/selftests/mm/uffd-perf.c | 207 +++++++++++++++++++++++
> tools/testing/selftests/mm/uffd-stress.c | 53 +-----
> virt/kvm/eventfd.c | 2 +-
> 17 files changed, 337 insertions(+), 64 deletions(-)
> create mode 100644 tools/testing/selftests/mm/uffd-perf.c
>
> --
> 2.41.0
>
next prev parent reply other threads:[~2023-09-07 19:19 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-09-05 21:42 Peter Xu
2023-09-05 21:42 ` [PATCH 1/7] mm/userfaultfd: Make uffd read() wait event exclusive Peter Xu
2023-09-05 21:42 ` [PATCH 2/7] poll: Add a poll_flags for poll_queue_proc() Peter Xu
2023-09-05 23:21 ` kernel test robot
2023-09-06 17:31 ` kernel test robot
2023-09-06 20:53 ` kernel test robot
2023-09-11 20:00 ` Peter Xu
2023-09-05 21:42 ` [PATCH 3/7] poll: POLL_ENQUEUE_EXCLUSIVE Peter Xu
2023-09-05 21:42 ` [PATCH 4/7] fs/userfaultfd: Use exclusive waitqueue for poll() Peter Xu
2023-09-05 21:42 ` [PATCH 5/7] selftests/mm: Replace uffd_read_mutex with a semaphore Peter Xu
2023-09-05 21:42 ` [PATCH 6/7] selftests/mm: Create uffd_fault_thread_create|join() Peter Xu
2023-09-05 21:42 ` [PATCH 7/7] selftests/mm: uffd perf test Peter Xu
2023-09-07 19:18 ` Axel Rasmussen [this message]
2023-09-08 22:01 ` [PATCH 0/7] mm/userfaultfd/poll: Scale userfaultfd wakeups Peter Xu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAJHvVcjQR95KVfu2qv3hepkLWkH5J8qRG_BazHKSXoGoGnUATg@mail.gmail.com \
--to=axelrasmussen@google.com \
--cc=aarcange@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=amoorthy@google.com \
--cc=brauner@kernel.org \
--cc=jthoughton@google.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mike.kravetz@oracle.com \
--cc=mingo@redhat.com \
--cc=nadav.amit@gmail.com \
--cc=peterx@redhat.com \
--cc=peterz@infradead.org \
--cc=rppt@kernel.org \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox