From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 873DCEEB570 for ; Fri, 8 Sep 2023 22:02:55 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id EE1BC6B00FD; Fri, 8 Sep 2023 18:02:54 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id E92706B00FE; Fri, 8 Sep 2023 18:02:54 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D599A6B00FF; Fri, 8 Sep 2023 18:02:54 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id C255A6B00FD for ; Fri, 8 Sep 2023 18:02:54 -0400 (EDT) Received: from smtpin03.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 9615BA117F for ; Fri, 8 Sep 2023 22:02:54 +0000 (UTC) X-FDA: 81214805868.03.71AE2F6 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf11.hostedemail.com (Postfix) with ESMTP id 7952D4001E for ; Fri, 8 Sep 2023 22:02:52 +0000 (UTC) Authentication-Results: imf11.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=J0enPOmZ; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf11.hostedemail.com: domain of peterx@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=peterx@redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1694210572; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=c/Mdr+GzW+sAH8pnXlRdcxL3Lco75+PnNmrEF8RR5jk=; b=q1nIQtcsshBnPhbP2/YzaZqKT8gcfb8OzpqWXZHi9iWeg3JVP1HCr6qUa8IRwNBsvUnK3n RB0LZJYPrlupQyvk0c7U5/Y7gGHnWHepJw8inBPS9IJgGDz6K6+mnJ8pdhmAL1YMeJVyeR Ik7iDsOIb1cmgPA+3PlX8ptkuib8qqA= ARC-Authentication-Results: i=1; imf11.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=J0enPOmZ; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf11.hostedemail.com: domain of peterx@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=peterx@redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1694210572; a=rsa-sha256; cv=none; b=islY1R/3+mzVpeMe4SQoYg0xheXncy/GkLRojhyu9PmuHR2N8vn9tk9VdCe1Dbn3rmtfb9 wIFmUIthXCpZdD483Y7nLuJjrrx6SCy6URcT1CaR6t7geU0HPZWKETYNRkE585gyW5ejNm gxLoLv2aCx4KJRlWSzehDP6+8xXjwDY= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1694210571; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=c/Mdr+GzW+sAH8pnXlRdcxL3Lco75+PnNmrEF8RR5jk=; b=J0enPOmZw73NbcjvEK4vjhmxahgTZbJlx9v/xKqJyQzjjukwoqJq52qDMLQtactufycnHB xLVaqwa1QZyG1Y/pQ65F1TvXUjpFzUtB3U+iCzfUvKbQtUxjThlBhJDhVNE3OnzbX+hXa5 7uQDuQun3/QNer1q0rAwehoV4mwH1nw= Received: from mail-qk1-f198.google.com (mail-qk1-f198.google.com [209.85.222.198]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-302-32XW2GocMNqfyn4O08U5Ng-1; Fri, 08 Sep 2023 18:02:50 -0400 X-MC-Unique: 32XW2GocMNqfyn4O08U5Ng-1 Received: by mail-qk1-f198.google.com with SMTP id af79cd13be357-76efdcb7be4so52985785a.1 for ; Fri, 08 Sep 2023 15:02:50 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1694210510; x=1694815310; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=c/Mdr+GzW+sAH8pnXlRdcxL3Lco75+PnNmrEF8RR5jk=; b=o1zcvSKo6aJ6265nECU4TdXCeqOYQgFeVkP7GFnArp6hyVrQSwkLx9mjq+FBrbbU1k drj9t7G6KzrbpU4j1z9aXLN9ZpA9vl43QdrNm7gF1fO9K0BAjyuRQ5aKNH6/hPuuDKKU NYfygQiA67bQNtDznchXL1JYypA5fyl/L1HV0MFtv4zk15vBWNAiUMbGd1N0wW5AvCRw FhpNMVPeOYD1uA7renozfWJALX81e9s722EkpR+LFgj7X/fCvwkHvW5pW/nzELOiDolJ dE9dULyveHtJn133AwRLrhzSHyoSmCx7M3kGEGlSHEJ7pKxe+V+5tL9b2qqCEavCLYQM OBGQ== X-Gm-Message-State: AOJu0YxFrZz2UjORG9uYqkhpJRHKT9DNrnmpzvYwj8OjkfmGoo+qrwJi C62t8QTJBH+7bVSpV8PeByvoOUIr4LJM7VuSimMylnzy/NrlxlOTad4AM4BKKxDzrvOrp2Lh4Pr V8S3SYCJe/6I= X-Received: by 2002:a05:620a:1a20:b0:76d:9234:1db4 with SMTP id bk32-20020a05620a1a2000b0076d92341db4mr3888993qkb.7.1694210510051; Fri, 08 Sep 2023 15:01:50 -0700 (PDT) X-Google-Smtp-Source: AGHT+IGYXsZIB5Fi5tMCsJrBEtgGgvTsQegolX3ll0BuwfoWJhX6998+NmmxiHzd2BWsKmfRMjarkA== X-Received: by 2002:a05:620a:1a20:b0:76d:9234:1db4 with SMTP id bk32-20020a05620a1a2000b0076d92341db4mr3888960qkb.7.1694210509711; Fri, 08 Sep 2023 15:01:49 -0700 (PDT) Received: from x1n (cpe5c7695f3aee0-cm5c7695f3aede.cpe.net.cable.rogers.com. [99.254.144.39]) by smtp.gmail.com with ESMTPSA id du32-20020a05620a47e000b0076cc4610d0asm884723qkb.85.2023.09.08.15.01.48 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 08 Sep 2023 15:01:49 -0700 (PDT) Date: Fri, 8 Sep 2023 18:01:26 -0400 From: Peter Xu To: Axel Rasmussen Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, Anish Moorthy , Alexander Viro , Mike Kravetz , Peter Zijlstra , Andrew Morton , Mike Rapoport , Christian Brauner , linux-fsdevel@vger.kernel.org, Andrea Arcangeli , Ingo Molnar , James Houghton , Nadav Amit Subject: Re: [PATCH 0/7] mm/userfaultfd/poll: Scale userfaultfd wakeups Message-ID: References: <20230905214235.320571-1-peterx@redhat.com> MIME-Version: 1.0 In-Reply-To: X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit X-Rspamd-Queue-Id: 7952D4001E X-Rspam-User: X-Rspamd-Server: rspam05 X-Stat-Signature: uhn5ggbsu9nj8kkipj95ayashn9fhqxr X-HE-Tag: 1694210572-260942 X-HE-Meta: U2FsdGVkX18C3LqbKr7+acaiSsmuelPykE+UWJgr0JFiuOYQpqUh4PAIr4dlaBSj4LPVK0HE5jYdnkYM6bo6nPAVo4njCwNIafgg3+rURSiXN0TvHGVePh8dDAKE694gX8U3GOaweYoRgvgRDuXePA0iD7fDuI4+RtsKd3AFlUIJb2vtfZsfXfSAsXsoDmUSWqoOadZky/wjdUcvNNRGrBC9MdPaAtagsDUKTE+RTxglhFe+5h1egmfM86ZSpbNY2kUCTnbe2Rn6v+3uQNCkSj96TxxHYp6BJdGmpsfPaw0yMiLKSaY54SzcP5p3u6kGqmkLN7FJYjz42R/Q6T1jyypb/wQb4W5RYLyKua9bU3u1PsE8BG/gBZWlTEClqrWy0/P83qjMLP1VCG15gvLY/ug+AYwsU1N/9rH1Yg0mCUOuyfS5+9L69wamqPwk31HXVKrKd/UR14Ucv9D5KRcgIitzCdmTxDsAtcyD2Hav7QsYKf4ID0zE9Nl85llqU+H72OUWNxw0n556EhfiTJ1x0aP9cvj6yFy7S1F9Dmi9xmt5hgwB1IgzqWnv5EgGNejeveZtWHziMPK8JmDIfPVx62uuvWwym4tzKaP4Yw1/2OEItjNGiMNea3z9JsxbarogW/wvnkLb2Rg3s1/hinX1klqaqnzHKJ8eBjEqxPs2KME48rKCBKBooTfVS2EGS8VbN8yy2W2vf6lZZ5AgBZvrrGzLB6DvtoRfUHFAFdAaUOQHbH486Co4rzkEECyNQRMknFn4yhXp7mJy2miDeU77XrWAr+MG/NHim/qZz7aYRw9oYh+CReV4l0ppNCF2glms/yM326tTgFvnn+psjif32sQOaop2MJsEeR4mPK3n92cCP9YhsgAXRcY8X1QAJSjfX1cmjuK9UCioFNBovZ7vjz1FRV6MOMAhqoO1ZuyLLT+FWZdOnX0ha0gBK7FEjKKa6FLojQLtcp7/bNFDdiZ cIHHehLz nZ9RW9l1Gtgw+CUVBEeiDGxIPWRd/IhgCzmoKe2AEz7s0evOzbzzidt8fPi7TK7A5wDDQVQrTGx5oW0RHidjwzR1wD4p9GTYvQO91nf981vg//SQwXW2e0nswpxhKYMp3DyrPGriGFmd53KTuGBhYk8z1Q780wHqJsRXx9Yz+Gu+mGODU5pzm86ee9e1lnBxqdH0SPoankvHSTr2+XqvP3Ksva9k3/7OXMP4EEl7JpSI9XRY5wpZd+QbsXwxvjGRMzbTz/NnWi2N6hf5yqdp8TxkChbcsKegURQn1lEjdLy4Zh2EECZTfmSy9/j42vg703goczCbePaiZA+Zi1cMaTSQoHh1nbTpSUY6uUSAi/XV88fzFHWCPmtxi5Jf2srbPXXASnp7k+rm+TuIKmgl7TBJXRMNXOfTef1FeAlBYEqga/Z/KXMI4sja6i0Nw1jgJcRBdHHfsrqKu574= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, Sep 07, 2023 at 12:18:29PM -0700, Axel Rasmussen wrote: > On Tue, Sep 5, 2023 at 2:42 PM Peter Xu wrote: > > > > Userfaultfd is the type of file that doesn't need wake-all semantics: if > > there is a message enqueued (for either a fault address, or an event), we > > only need to wake up one service thread to handle it. Waking up more > > normally means a waste of cpu cycles. Besides that, and more importantly, > > that just doesn't scale. > > Hi Peter, Hi, Axel, Sorry to respond late. > > I took a quick look over the series and didn't see anything > objectionable. I was planning to actually test the series out and then > send out R-b's, but it will take some additional time (next week). Thanks. The 2nd patch definitely needs some fixup on some functions (either I overlooked without enough CONFIG_* chosen; I am surprised I have vhost even compiled out when testing..), hope that won't bring you too much trouble. I'll send a fixup soon on top of patch 2. > > In the meantime, I was curious about the use case. A design I've seen > for VM live migration is to have 1 thread reading events off the uffd, > and then have many threads actually resolving the fault events that > come in (e.g. fetching pages over the network, issuing UFFDIO_COPY or > UFFDIO_CONTINUE, or whatever). In that design, since we only have a > single reader anyway, I think this series doesn't help. Yes. If the test to carry out only uses 1 thread, it shouldn't bring much difference. > > But, I'm curious if you have data indicating that > 1 reader is more > performant overall? I suspect it might be the case that, with "enough" > vCPUs, it makes sense to do so, but I don't have benchmark data to > tell me what that tipping point is yet. > > OTOH, if one reader is plenty in ~all cases, optimizing this path is > less important. For myself I don't yet have an application that can leverage this much indeed, because QEMU so far only uses 1 reader thread. IIRC Anish was exactly proposing some kvm specific solutions to make single uffd scale, and this might be suitable for any use case like that where we may want to use single uffd and try to make it scale with threads. Using 1 reader + N worker is also a solution, but when using N readers (which also do the work) the app will hit this problem. I am also aware that some apps use more than 1 reader threads (umap), but I don't really know more than that. The problem is I think we shouldn't have that overhead easily just because an app invokes >1 readers, meanwhile it also doesn't make much sense to wake up all readers for a single event for userfaults. So it should always be something good to have. Thanks, -- Peter Xu