From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3B829C636D7 for ; Tue, 21 Feb 2023 23:14:04 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A42BA6B0073; Tue, 21 Feb 2023 18:14:03 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 9F2F96B0074; Tue, 21 Feb 2023 18:14:03 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8939A6B0075; Tue, 21 Feb 2023 18:14:03 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 7B6676B0073 for ; Tue, 21 Feb 2023 18:14:03 -0500 (EST) Received: from smtpin30.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 472441A0167 for ; Tue, 21 Feb 2023 23:14:03 +0000 (UTC) X-FDA: 80492853966.30.5B5A343 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf13.hostedemail.com (Postfix) with ESMTP id 3E6F620010 for ; Tue, 21 Feb 2023 23:14:01 +0000 (UTC) Authentication-Results: imf13.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=VKz1kegv; spf=pass (imf13.hostedemail.com: domain of peterx@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1677021241; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=zjoBVATOtfv9v8rBFcDxxSIvzUrutqIng+kqTLVn4fY=; b=WzFEZNUIa7ZQRaDaxq0eTwVjCzrmilYWZYMgWxHkPT590O6G59yjoCweEJ5r67YpgHDBOe tEZHoM1+Q/FJUZtWAHyPIc2lSYbH79re12zDIOGO9JqG0GH+ZPjmZIL1Izq7WQjq1MNeqK iASxF6WJgaPlDcMs6TcyA+7VtkdfyXU= ARC-Authentication-Results: i=1; imf13.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=VKz1kegv; spf=pass (imf13.hostedemail.com: domain of peterx@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1677021241; a=rsa-sha256; cv=none; b=N1BRNi0+RkQXVAkeFGAGUq7681gqa9zeP4GuS4zoc8Q03Ct44XATMyB1N2n5fmFrTiPZQV SMzOIub1PKgl9LrfXOHO++HgZFWMo5q3xP+tGm3RD1lA+d8uR9Q0vsGkT/yF0ajWYWZBzq 8GA6jcaNlwdj7OyP7vbu/ivG5CgP9vw= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1677021240; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=zjoBVATOtfv9v8rBFcDxxSIvzUrutqIng+kqTLVn4fY=; b=VKz1kegv8qMExoyyl3BFhjs5VEQwymVdDcIcCo52hXcqqHjd1mrP24/XbSutphwdmEec/e Jj1SQcHRy7dZJ2xXkDAdQAtn8mkUm53O3looYJfmhC3tG7aGkZKHASX0YRcz21mYi8nO7H /1Xs1HxrRaxYYQewm19VcXwSmWiPmk0= Received: from mail-qt1-f200.google.com (mail-qt1-f200.google.com [209.85.160.200]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-531-cgH6seJqOmuEJILMYv7gDg-1; Tue, 21 Feb 2023 18:13:58 -0500 X-MC-Unique: cgH6seJqOmuEJILMYv7gDg-1 Received: by mail-qt1-f200.google.com with SMTP id fu21-20020a05622a5d9500b003bf9ece0541so2131909qtb.16 for ; Tue, 21 Feb 2023 15:13:58 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1677021237; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=zjoBVATOtfv9v8rBFcDxxSIvzUrutqIng+kqTLVn4fY=; b=yLDgFM4U2jPotEaY2zHiJlGSyD/UkIPy1XntBjpNCMa/JsHwrQGELkbURo2aQRw3pe +4agvOYJ2OYzyCb4ytdW5hURppQ18MCUlMJCH615BPFBdsLqhou2FpmIDrCw5A6mfddg RbdjlwPn/kwejQxskeuwV0hd/FFQowsdu3nMPcR4Ocau7c+oEGg8tC79qit1jTWmqiZo erenUPJSP8Z1IM2IIGee1f0yJ9xjV1pKmTMDWooux3Nc+zjBVLdt8/K4WIUHhXMBV88b U3AZpNcbnTNLn3wuSm1Wd4+2N63v8JwlbbCKF9s1uO/+3c8XCvqeO9DeneDj5PeDvd/5 CrEQ== X-Gm-Message-State: AO0yUKWadRQmkMUFqTa7cEW0PPHpPyKN2GlMUNTyWDPLiVGjbWflqhMH xNKOy+T2PUlz0kAjFZXF3IBlJcMmc1Y2y16zBHfLDzXLyrNdhZ8a4ckyhh8+/hhMIYI0uJsHFBp ZDUGlFOlmOWg= X-Received: by 2002:a0c:f4c1:0:b0:56e:f7dd:47ad with SMTP id o1-20020a0cf4c1000000b0056ef7dd47admr8917228qvm.5.1677021237420; Tue, 21 Feb 2023 15:13:57 -0800 (PST) X-Google-Smtp-Source: AK7set9d0bGlAnIDa86XEfhWI141EdIN8UnlgEpLgUNpSwq1LVJygd8DRb9SezIDQdfK/A668iuZWg== X-Received: by 2002:a0c:f4c1:0:b0:56e:f7dd:47ad with SMTP id o1-20020a0cf4c1000000b0056ef7dd47admr8917201qvm.5.1677021237102; Tue, 21 Feb 2023 15:13:57 -0800 (PST) Received: from x1n (bras-base-aurron9127w-grc-56-70-30-145-63.dsl.bell.ca. [70.30.145.63]) by smtp.gmail.com with ESMTPSA id d191-20020a3768c8000000b0073b4d8cb4a5sm1666128qkc.60.2023.02.21.15.13.55 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 21 Feb 2023 15:13:56 -0800 (PST) Date: Tue, 21 Feb 2023 18:13:55 -0500 From: Peter Xu To: David Hildenbrand Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, Axel Rasmussen , Mike Rapoport , Andrew Morton , Andrea Arcangeli , Nadav Amit , Muhammad Usama Anjum Subject: Re: [PATCH] mm/uffd: UFFD_FEATURE_WP_ZEROPAGE Message-ID: References: <20230215210257.224243-1-peterx@redhat.com> <7eb2bce9-d0b1-a0e3-8be3-f28d858a61a0@redhat.com> <4f64d62f-c21d-b7c8-640e-d41742bbbe7b@redhat.com> <456f8e2e-9554-73a3-4fdb-be21f9cc54b6@redhat.com> <4dbc9913-3483-d22d-bbd2-e4f510fff56d@redhat.com> MIME-Version: 1.0 In-Reply-To: X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8 Content-Disposition: inline X-Rspam-User: X-Rspamd-Server: rspam03 X-Stat-Signature: x937xxpwc8ddrstjcdnhs6eh4esdayem X-Rspamd-Queue-Id: 3E6F620010 X-HE-Tag: 1677021241-75725 X-HE-Meta: U2FsdGVkX18LcRfdz/53Gl80ykTmLzgMvBoFIYASDeceZDaOmryLrHsVI92fL+drq2LzCRxnrpyUCjtevzuJcB+5hgXSrtwzKgIemuevAsNsjz5ctbL5hLEJZjprWhr5Y278Kc+ZpnDSnUF3tE19ALr2Bj6pK3EmPpebJ1EMe2tCmpcCBG6ijMs+Gk2D8a8F9vclCStVdaAyw8kwfhxIcEgRou2QqoZjuZEiLc1Otmzl8KAY8DLIVHcwF8xJqL+IHNkLOuy6vg8+wGsgAu0eOCtl8zqhCumYLN+l+C/d8jVjYhz8DK3TvfR4ysGDh1Z0ZZ/EIAp4gV2aplZ72UL3u/gq26eR0gT0NCshF9TfrcxjcERhPrzogNyRfhL1XtQAHhgwT+2HC0jEID6padkXLxaZ70IJwgsI0pia+GaejDjM2xSwIK1xwWoq/+RczF5N0MzjYIFtlyePUV0sgBoJpSifEOtc5zQ1UAPmMNd33HVIhc+IY51m1ivkDZ8mOVjVq3sdOUVgm1LRG1HFQ5Nau3Ip3nyjFBlrGeE46+8VEnwOC+PBnI56JeR8WwDZM9dFGv7bDv+irr4ao8WmWhiRz14x9uxprrqtPxV7214By0No1np+Kr7yr6KA76pctynfTEaYJPCTBw0UYyecuJQ3uw4F/A7k5UtVS7Ja34mmYm7jnKqGRkEfEtXUNDZg/+paRlddlhdGFoKOIM/k+rCi/1DH31+fYVpOAa1WROtoKDt33WwWRqbNwm9b2DGsC/0VUluidM3pVr1w6lFm/Tbug0h/w8xo6sT2FrhUj12cP7zpGIKuZZyQKEA5PuYYzJb9hiN4TOSoWg1TQssVDl6zA9QB0MZ9uC0WOdllkSBU1JpKWK4t4qJPi89CRDMbKuNUENu+6XPbWco4CoM1BxJGvNnfAjv2SU9DjOG+ZRQFf/Ew8k7PDN2Q1l4dt2rSy1ut2hwnM37LR69YvmRkx96 ERtPUzK2 4baWp3m5occ6vyF6F79kkCFzMT2ulo1J/lLJSd+jPndJ0zm+4MTsuoLPemIy4i+BTopezVgAvZofLW9C2MOT0j/dkQTfU07mEd8rB0Ft6BvwdVGE30MKO1GiKkJDWyG+9La3lNvgBdTIvkUhuv+3hJdwvhryFlRT4yf4s1NztTgErrK8M9x7ZGw7qjQbUXn+qrIuIrIs5KIAGhWgcNUD5O7LkQrcUKdxQzwY3x3DfwkXx0CmeEqJywR4rkVGFjTiBxiElXjVo5oJrpKeRdn1iqYxuUOxY/KX4fEy5q1gdtx/i+Xe9Tn7s+vovcs0kB9/pzAlT6l0pog8WOZpES254WgMetknVsRpQ6eZ4jcWJdxANW1NYlzxy4/g78ZPaV0HP5xcbbWx4MrvYRgDaJyUjubrN4nJTJXGTT5P0v+vNi49XLsd1DBQz3cE/eCvet9GFDAkKtpkxR8ujEDZucuPY13KXv8IYvJ6lacePkqnZZsBtxXjZ5LI06dKHUg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, Feb 21, 2023 at 01:43:28PM +0100, David Hildenbrand wrote: > I think what we really want to avoid is, creating a new VMA and requiring to > populate page tables just to set the PTEs softdirty. > > The VMA flag is one way, but it might prevent merging as we discovered. > Changing the semantic of "pte_none()" to mean " dirty" is another one. AFAIU, seeing pte_none() as dirty obviously adds false positives in another way, comparing to what happens when we merge vmas. > Simply because we cared about getting it precise for uffd-wp, which nobody > cared for before for soft-dirty. And yes, there are similar issues to be > solve. > > You are much rather turning uffd-wp with the async mode into a soft-dirty > replacement, Exactly. When I was discussing uffd-wp years ago with Andrea, Andrea already mentioned about replacing soft-dirty with uffd-wp since then. We wasn't really clear about what interface it would look like; at that time the plan was not using pagemap, but probably something else to avoid the pgtable walking. I thought about that later with other forms like ring structures, not so much. Later on I figured that maybe it's not that trivial to do so, and the benefit is not clear, either. We know we may avoid pgtable walks, but we don't yet know what to lose. > instead using what we learned with uffd-wp to make soft-dirty more > precise. I hope it's not in a way we duplicate many things from userfaultfd, though. As I mentioned before, we can have yet another bit reserved in pte markers for soft-dirty and that was actually the plan, but if they'll grow into something even more similar, it'll be fair if someone asks "why bother?". The other thing is IIUC soft dirty just took the burden of compatibility, if that works out we don't probably need uffd-wp async mode on the other way round - in short, if we can have one thing working for all cases IMHO we don't bother duplicating in the other. > > Fair enough, I won't interfere. The natural way for me to tackle this would > be to try fixing soft-dirty instead, or handle the details on how soft-dirty > is implemented internally: not exposing to user space that we are using > uffd-wp under the hood, for example. > > > Maybe that would be a reasonable approach? Handle this all internally if > possible, and remove the old soft-dirty infrastructure once it's working. > > We wouldn't be able to use uffd-wp + softdirty, but who really cares I guess > ... The thing is userfaultfd is an exposed and formal kernel interface to userspace already, before / if this new async mode will land. IMHO it's necessary in this case to let the user know what's happening inside rather than thinking this is not important and make decision for the user. We don't want to surprise anyone I guess.. It's not only from the angle where an user may be using userfault in its tracee app, so the user will know why the "new soft-dirty" won't work. It's also about maintaining compatible with soft-dirty even if we want to replace it some day with uffd-wp - it means there'll at least be a period of having both of them exist, not until we know they're solidly replaceable between each other. So far it's definitely not in that stage.. and they're not alike - it's just that some of us wanted to have soft-dirty change into something like uffd-wp, then since the 1st way is not easily achievable, we can try the other way round. > > > > > One thing I didn't mention before (mostly referring to the 1st major > > "defect" of using uffd-wp above I said [1] on memory types): _maybe_ we can > > someday extend at least async mode of uffd-wp to all memory types, so it'll > > even get everything covered. So far I don't see a strong requirement of > > doing so, but I don't see a major blocker either. > > Architecture support is, of course, another issue. Of course, if we could > replace soft-dirty tracking by uffd-wp internally that would make things > easier ... Yes, here it was about page caches, but arch support is another thing. Uffd-wp is just not as widely spread as soft-dirty to multi-archs, and also many users may not need that accuracy (by paying off performance). > > > > > While the other "uffd cannot be nested" defect is actually the same to > > soft-dirty (no way to have a tracee being able to clear_refs itself or > > it'll also go a mess), it's just that we can still use soft-dirty to track > > an uffd application. > > I wonder if we really care about that. Would be good to know if there are > any relevant softdirty users still around ... from what I understoodm even > CRIU wants to handle it using uffd-wp. Yeah I don't know either. > Jup. What does this mean? Thanks, -- Peter Xu