From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D3274EC8758 for ; Thu, 7 Sep 2023 19:19:12 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 100E58D0002; Thu, 7 Sep 2023 15:19:12 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 08B298D0001; Thu, 7 Sep 2023 15:19:12 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E1EF78D0002; Thu, 7 Sep 2023 15:19:11 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id CBBB78D0001 for ; Thu, 7 Sep 2023 15:19:11 -0400 (EDT) Received: from smtpin14.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 9FCC3A0291 for ; Thu, 7 Sep 2023 19:19:11 +0000 (UTC) X-FDA: 81210764502.14.8827FA2 Received: from mail-lj1-f182.google.com (mail-lj1-f182.google.com [209.85.208.182]) by imf24.hostedemail.com (Postfix) with ESMTP id BA7E318001F for ; Thu, 7 Sep 2023 19:19:09 +0000 (UTC) Authentication-Results: imf24.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=7QYSqw6M; spf=pass (imf24.hostedemail.com: domain of axelrasmussen@google.com designates 209.85.208.182 as permitted sender) smtp.mailfrom=axelrasmussen@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1694114349; a=rsa-sha256; cv=none; b=aq7d7B9idogO7NBliekRFfMIqUju+E+BiQH9V6GUNbb5yRpCkkitqbhE6fiQL06LdUHFIx yC5m6YqebyNPrwy7ArNHKuZ6rD/QKYrBOlX85wxyTpJa9bFLGxBQQPR4eS5PH5DhcACl7W RfNVoJQHuhO3ejPbqCy189qKlg6JsK8= ARC-Authentication-Results: i=1; imf24.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=7QYSqw6M; spf=pass (imf24.hostedemail.com: domain of axelrasmussen@google.com designates 209.85.208.182 as permitted sender) smtp.mailfrom=axelrasmussen@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1694114349; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=fh9SxXdeDgiYefKxTuLt7vjL4MJYtZ0NR5+isE1gnSw=; b=1ZcYmsaYrxr0C0xZyqkqupEYyrrKAydvaaIAVUSSO8jtOJsTgZbT1TxyGHk7GPn980pc2c xqNQh7IZuD4WDSF9hmIzTCpAIqNhCZez6Z+ar1vPjpK1c4a4VAlufmHXmJ21Zjf2fAnefU kyf8HfZHBxBp2KZovc34YmK3xNOHD/w= Received: by mail-lj1-f182.google.com with SMTP id 38308e7fff4ca-2bd3f629c76so23061591fa.0 for ; Thu, 07 Sep 2023 12:19:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1694114348; x=1694719148; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=fh9SxXdeDgiYefKxTuLt7vjL4MJYtZ0NR5+isE1gnSw=; b=7QYSqw6M3znOuG2awF9Tk79bw7fPaalWZvgA/z1U/ugUkKCsSmWPFLjAvvyiJtO1NT DCbMe12DIm5h7O5gslPk5chk4eAgPCpL8oH9JQKU89hyhri2Qr0UgQKHods1VqzQHfap gG0m1VAu1bzJF42TzAgIqJu0ojERXaF8lROxiP16UuiBwEfSG/0OkgW1fJVr73Ch4khn z8EVeRjLlD/OgVxKIT20JXEVvDS6b+qVFJMO8kzSj2emGRf0QmC+61WIDbwVID+SQrOV wh7nAVakKv/T5+FAqELkB9tOAVq97q4c+PDm8dQ9KXSObCyaVzHvRltEJTm6Hgncwfzx 4hxw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1694114348; x=1694719148; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=fh9SxXdeDgiYefKxTuLt7vjL4MJYtZ0NR5+isE1gnSw=; b=HP/NJWLJ78/bDTMA2LKUypnYL7e02WjHVSB9eahr/dW1XdWaIX1tkDzwF7BDfrFYiu AxMVUQMt2ER+FiyRVHvRz7mCuLuyOoz3Xpy1wrtIWl4qgt92giJ+X4fZ9X4EYIuUzD4z 6vABox9Npx6IgHTfQZyOVVXMTQAdYyyPAapMJWN5BZOuicWzn6cwsZqrIencdDwREpUK 54XXsl5X2vr6XKOnsNv6hxh4VyZzbgX+HNgZTemwLWjhIlD6sj16T0Tjdi5OHj+TiOXS uyXyeodFDlMbmrlD6Pmff5TW0q5gxhucoOODbdinMhbZPtvcW/y61Kl0JMiuEkP5VCIo KuwA== X-Gm-Message-State: AOJu0YwixIcv2jMMfmPiY6myZTAIQAhLtZhPWiaSgxnVAooYesHuH9k0 bLZTjIDzqW9HkEnGbj5L+EKLoAfB0WJMA0gvGGzZcA== X-Google-Smtp-Source: AGHT+IHrXpMuSwQQQ03SPRDxfKGtSaY06C8ng9QkYeonij74hOQ5lKt2eSklQHKaLviUpJ2Pf2FzYgAOrtQ2pDDzcSs= X-Received: by 2002:a2e:b0ee:0:b0:2be:4d40:f833 with SMTP id h14-20020a2eb0ee000000b002be4d40f833mr131512ljl.18.1694114347766; Thu, 07 Sep 2023 12:19:07 -0700 (PDT) MIME-Version: 1.0 References: <20230905214235.320571-1-peterx@redhat.com> In-Reply-To: <20230905214235.320571-1-peterx@redhat.com> From: Axel Rasmussen Date: Thu, 7 Sep 2023 12:18:29 -0700 Message-ID: Subject: Re: [PATCH 0/7] mm/userfaultfd/poll: Scale userfaultfd wakeups To: Peter Xu Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, Anish Moorthy , Alexander Viro , Mike Kravetz , Peter Zijlstra , Andrew Morton , Mike Rapoport , Christian Brauner , linux-fsdevel@vger.kernel.org, Andrea Arcangeli , Ingo Molnar , James Houghton , Nadav Amit Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: BA7E318001F X-Stat-Signature: zuobwpwxn5tmq3zx5fs3ax4b4aztfhhy X-Rspam-User: X-HE-Tag: 1694114349-881558 X-HE-Meta: U2FsdGVkX1977HEOym9gn3cjprXx03joLDvck3qAxTxDvVZxT6HQeOJKYyny9lv2h95HOpJmZsEiisa04N7uTwp36UCe/wIEgfY4D/cUJQHCCLXiXUkhO900qX1MHTHGxXJlNdM14tglyGlpLrxuRlJrWD1UoGYvxuUacFI8C4XK5KhWp9XYM8IkDFabvIK5ZgwJkWYXIeSvQHSI9dwN0oiTf794oz6aGykmT7g78HELaqXj4WVPvat+jtg9emMC1V273UH1o0+ectdsA+NJ4ZDzsWguE2YLh5YV6MDkWhJ00GtNu1QbBDujz1VsAuDYTf04zPepSHztb/N3i+Ej/mQ8yj+/amyUKd2TvpYmxfIGG673r+fnysdMWkL+xKkEw74LxhMLCnmdVJOcp04iLivX10D3pSHEzQoMry0Qbv6VOR5eaIA9Cl5mCz6zeV8F5wLvZcWDdVzWAFWVu6mT8fRVbNFeU05/nNz6TmvPr6UP8tR0PJdoz6rpkuRZhcwDCcCQjoWuhWYvjf2f+leEUUD9ps2RqDy0iBoUD3Q4hcEGFDiv12YuLZ/Kux75i7b27NjIzRY2Xop0kuTX/7kAeLsl8hO5k/1AwXj1T6O5OtUMZWePO5uiiNpXHCcfs91jLeOtr5+5SosKQ/yDJbrYkcjYK95huEQcDfuG52/FGliREJ/9jyWzR5PEDKBjZhDuF6MQ4qyWenK8nluvCWJLX7s9/e64hejUp1/6YXuanczEE7I/MAC8gCicy9dd3zZvRIh5Sojmsl2byDlIt2XwTPUOTSOgMV9zHaT3NHlrvwozp6xPpWm56xxaEW6oZhhoJT/gPJaZHzxpdZkfAwOR3qfe/Jwku8NtaD6162JDCduT3FmdDrwjOmCsGGMLVBElgc3Pn6Z0GJKRDG2QEl7DSyb5TY4vkPjxtw6BmfofnI2lOfgQQyPdj/e4B8GRIaJZmJhZFhpG+3wjUqtN5ur ErqaO4UX +MPTuq3QkrCEjwZWOzkGsP2KP+LBw++thH9FXOhqafqHWQ1ecAnn/xQInQDxz6gaNqDOC5R1MjmofL7XPpgJKVSJI8frYgbCcGgn/jJljjI6lYoMvTS+6YJHrhN8eZ7/YCqhImhqsWZ+s+qMMMZ+vacs7Y/CMHa3J21bpOn/jdKNGSasFwCbBARZz0wxuqkuuPgJT1uHLdEJDni8AViTRDe3RhBSP8tWh27WZS2VS15eRhST5JWsqvj0Hq4QrI3w7QqNvKDTCzd4xf5ZJYYu3DzhRjGLyTeOWyxVk6bK2gjxPmaF6pV2qGs/seXTMDMpMrZf5 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, Sep 5, 2023 at 2:42=E2=80=AFPM Peter Xu wrote: > > Userfaultfd is the type of file that doesn't need wake-all semantics: if > there is a message enqueued (for either a fault address, or an event), we > only need to wake up one service thread to handle it. Waking up more > normally means a waste of cpu cycles. Besides that, and more importantly= , > that just doesn't scale. Hi Peter, I took a quick look over the series and didn't see anything objectionable. I was planning to actually test the series out and then send out R-b's, but it will take some additional time (next week). In the meantime, I was curious about the use case. A design I've seen for VM live migration is to have 1 thread reading events off the uffd, and then have many threads actually resolving the fault events that come in (e.g. fetching pages over the network, issuing UFFDIO_COPY or UFFDIO_CONTINUE, or whatever). In that design, since we only have a single reader anyway, I think this series doesn't help. But, I'm curious if you have data indicating that > 1 reader is more performant overall? I suspect it might be the case that, with "enough" vCPUs, it makes sense to do so, but I don't have benchmark data to tell me what that tipping point is yet. OTOH, if one reader is plenty in ~all cases, optimizing this path is less important. > > Andrea used to have one patch that made read() to be O(1) but never hit > upstream. This is my effort to try upstreaming that (which is a > oneliner..), meanwhile on top of that I also made poll() O(1) on wakeup, > too (more or less bring EPOLLEXCLUSIVE to poll()), with some tests showin= g > that effect. > > To verify this, I added a test called uffd-perf (leveraging the refactore= d > uffd selftest suite) that will measure the messaging channel latencies on > wakeups, and the waitqueue optimizations can be reflected by the new test= : > > Constants: 40 uffd threads, on N_CPUS=3D40, memsize=3D512M > Units: milliseconds (to finish the test) > |-----------------+--------+-------+------------| > | test case | before | after | diff (%) | > |-----------------+--------+-------+------------| > | workers=3D8,poll | 1762 | 1133 | -55.516328 | > | workers=3D8,read | 1437 | 585 | -145.64103 | > | workers=3D16,poll | 1117 | 1097 | -1.8231541 | > | workers=3D16,read | 1159 | 759 | -52.700922 | > | workers=3D32,poll | 1001 | 973 | -2.8776978 | > | workers=3D32,read | 866 | 713 | -21.458626 | > |-----------------+--------+-------+------------| > > The more threads hanging on the fd_wqh, a bigger difference will be there > shown in the numbers. "8 worker threads" is the worst case here because = it > means there can be a worst case of 40-8=3D32 threads hanging idle on fd_w= qh > queue. > > In real life, workers can be more than this, but small number of active > worker threads will cause similar effect. > > This is currently based on Andrew's mm-unstable branch, but assuming this > is applicable to most of the not-so-old trees. > > Comments welcomed, thanks. > > Andrea Arcangeli (1): > mm/userfaultfd: Make uffd read() wait event exclusive > > Peter Xu (6): > poll: Add a poll_flags for poll_queue_proc() > poll: POLL_ENQUEUE_EXCLUSIVE > fs/userfaultfd: Use exclusive waitqueue for poll() > selftests/mm: Replace uffd_read_mutex with a semaphore > selftests/mm: Create uffd_fault_thread_create|join() > selftests/mm: uffd perf test > > drivers/vfio/virqfd.c | 4 +- > drivers/vhost/vhost.c | 2 +- > drivers/virt/acrn/irqfd.c | 2 +- > fs/aio.c | 2 +- > fs/eventpoll.c | 2 +- > fs/select.c | 9 +- > fs/userfaultfd.c | 8 +- > include/linux/poll.h | 25 ++- > io_uring/poll.c | 4 +- > mm/memcontrol.c | 4 +- > net/9p/trans_fd.c | 3 +- > tools/testing/selftests/mm/Makefile | 2 + > tools/testing/selftests/mm/uffd-common.c | 65 +++++++ > tools/testing/selftests/mm/uffd-common.h | 7 + > tools/testing/selftests/mm/uffd-perf.c | 207 +++++++++++++++++++++++ > tools/testing/selftests/mm/uffd-stress.c | 53 +----- > virt/kvm/eventfd.c | 2 +- > 17 files changed, 337 insertions(+), 64 deletions(-) > create mode 100644 tools/testing/selftests/mm/uffd-perf.c > > -- > 2.41.0 >