From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C835FC433EF for ; Wed, 20 Jul 2022 20:22:21 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 2F5866B0072; Wed, 20 Jul 2022 16:22:21 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 2A26F6B0073; Wed, 20 Jul 2022 16:22:21 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 169FB6B0074; Wed, 20 Jul 2022 16:22:21 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 078306B0072 for ; Wed, 20 Jul 2022 16:22:21 -0400 (EDT) Received: from smtpin22.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id C3675C06E7 for ; Wed, 20 Jul 2022 20:22:20 +0000 (UTC) X-FDA: 79708600440.22.5896B08 Received: from mail-pj1-f46.google.com (mail-pj1-f46.google.com [209.85.216.46]) by imf18.hostedemail.com (Postfix) with ESMTP id 62F121C008B for ; Wed, 20 Jul 2022 20:22:19 +0000 (UTC) Received: by mail-pj1-f46.google.com with SMTP id d7-20020a17090a564700b001f209736b89so3197303pji.0 for ; Wed, 20 Jul 2022 13:22:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=DHXC5nqXXeAHRQ2EELeV2u1wYvoNpIJGu0NbYI1JGbY=; b=lyGREuuhbOTh+OMQIPD88iy01t69N8RI/ajGSoWG3Hywiym24ujD3zlZ/s0K860/+h xnXLzxjYfSLoqFd66Vi75Dn/03IA9s+cp30A6QAktmqt92IZ9XRkWyXQLnpvqnxLfr0X x+Fzi1MfqhRe4QMQ0WyZjcIm70R4RoTPsw5n4sWmIPTITGvicgOubsUvnutyD4FOPrga 0luDL0/dSos/iuIg7KtprCAFgyaa6Ca5C3b064sOoQ3mHvQm2cjuRN5K8D2PNNEUsyx/ 8kcuTp6gtfKnxZcL9F1mF5iTI09CEEZBOo3mdqBXroTcikRIIRZZtoVerg/0RcegZz7V QAAA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=DHXC5nqXXeAHRQ2EELeV2u1wYvoNpIJGu0NbYI1JGbY=; b=njkVOkh3Fbf9GwzSYOgvTWwkHRDrO9MROc80F3Ui02wRT5TqBlybUCEBElMshOBbTI c725m0yHBEkitTh/8LAnyrNIID34pt+GeiXf05UIKQ+um79mdF69x+0t7/eaXlmQhQTR +uxm/3RD1fAOAd5pqmS2r1JAK5/MjJUUCNlD+50YTYT+BsEDXhujF0acbnETkiDxoDcp 4Cmj0EiQvEunV/BugYLxGsugq8aRkMwXsAMFWD0XSEr3L1zTu14RV90L4FaDa/XtzJkv qjmW59ZxU2AgEzFocBWMf955LuvwFNbTKYsBxv7HZM5oMklnAg79Q+8iFQGbi8EXKdR8 ueIw== X-Gm-Message-State: AJIora9eVIwvHzntc+gz/y7keND/cq3cC8g9rOTXQxu1Qi58YOG4ZUGL dVHmiyfTMvBcPfL3KhzCTVI= X-Google-Smtp-Source: AGRyM1smWNZVA6D3HKFBJbvrCnr2a8Mjx7pjyxZXp/gQMNGJPdv5AWEOl/zfxKF856Rf5Evl7HCHMA== X-Received: by 2002:a17:902:7c04:b0:16c:2e00:395a with SMTP id x4-20020a1709027c0400b0016c2e00395amr40844416pll.123.1658348537951; Wed, 20 Jul 2022 13:22:17 -0700 (PDT) Received: from smtpclient.apple ([66.170.99.113]) by smtp.gmail.com with ESMTPSA id x194-20020a6286cb000000b0052ad49292f0sm50416pfd.48.2022.07.20.13.22.16 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Wed, 20 Jul 2022 13:22:17 -0700 (PDT) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3696.100.31\)) Subject: Re: [RFC PATCH 01/14] userfaultfd: set dirty and young on writeprotect From: Nadav Amit In-Reply-To: <4ad140b5-1d5b-2486-0893-7886a9cdfd76@redhat.com> Date: Wed, 20 Jul 2022 13:22:15 -0700 Cc: Peter Xu , Linux MM , LKML , Andrew Morton , Mike Rapoport , Axel Rasmussen , Andrea Arcangeli , Andrew Cooper , Andy Lutomirski , Dave Hansen , Peter Zijlstra , Thomas Gleixner , Will Deacon , Yu Zhao , Nick Piggin Content-Transfer-Encoding: quoted-printable Message-Id: <95320077-52CF-4CB0-92F9-523E1AE74A3D@gmail.com> References: <20220718120212.3180-1-namit@vmware.com> <20220718120212.3180-2-namit@vmware.com> <017facf0-7ef8-3faf-138d-3013a20b37db@redhat.com> <2b4393ce-95c9-dd3e-8495-058a139e771e@redhat.com> <69022bad-d6f1-d830-224d-eb8e5c90d5c7@redhat.com> <4ad140b5-1d5b-2486-0893-7886a9cdfd76@redhat.com> To: David Hildenbrand X-Mailer: Apple Mail (2.3696.100.31) ARC-Authentication-Results: i=1; imf18.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=lyGREuuh; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf18.hostedemail.com: domain of nadav.amit@gmail.com designates 209.85.216.46 as permitted sender) smtp.mailfrom=nadav.amit@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1658348539; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=DHXC5nqXXeAHRQ2EELeV2u1wYvoNpIJGu0NbYI1JGbY=; b=3DNeCQQhw58XB9fKXlBB5P+9cXAQkoA9Vgs73Oabd/cZGm5IUh6cXi7JJnh2sp18yDfYEe h8fmsZK/OD/eE0WtbRb02ellJjRnQCOjxpigmZ4t4NKY/ZMJ7txi2GPbh4lW3CLEfVWhZv NCaioo2FWsTd+/MWJ8v18hFNRGqN6+0= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1658348539; a=rsa-sha256; cv=none; b=qy4NjO8g4iYbzMWyy9Ric6LXg0GvlIJg6t7WlYdNYZ3Lub897qw3aJxXvwMrmGHEnjHIII /y3n03Vdzx1B8UY0gOJ1nDaOfZlgoxP3TULeSymuUKAtM5msgylxzRDX/DSpI9X7ZUaKj/ y7S2KkZnj4tnFZfgid1Scf9OdeRCzZA= X-Rspamd-Queue-Id: 62F121C008B Authentication-Results: imf18.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=lyGREuuh; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf18.hostedemail.com: domain of nadav.amit@gmail.com designates 209.85.216.46 as permitted sender) smtp.mailfrom=nadav.amit@gmail.com X-Rspam-User: X-Rspamd-Server: rspam04 X-Stat-Signature: mg84cfsbm1pt7zfbkgow7ruiifeazuog X-HE-Tag: 1658348539-879660 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Jul 20, 2022, at 12:55 PM, David Hildenbrand = wrote: > On 20.07.22 21:48, Peter Xu wrote: >> On Wed, Jul 20, 2022 at 09:33:35PM +0200, David Hildenbrand wrote: >>> On 20.07.22 21:15, Peter Xu wrote: >>>> On Wed, Jul 20, 2022 at 05:10:37PM +0200, David Hildenbrand wrote: >>>>> For pagecache pages it may as well be *plain wrong* to bypass the = write >>>>> fault handler and simply mark pages dirty+map them writable. >>>>=20 >>>> Could you elaborate? >>>=20 >>> Write-fault handling for some filesystems (that even require this >>> "slow path") is a bit special. >>>=20 >>> For example, do_shared_fault() might have to call page_mkwrite(). >>>=20 >>> AFAIK file systems use that for lazy allocation of disk blocks. >>> If you simply go ahead and map a !dirty pagecache page writable >>> and mark it dirty, it will not trigger page_mkwrite() and you might >>> end up corrupting data. >>>=20 >>> That's why we the old change_pte_range() code never touched >>> anything if the pte wasn't already dirty. >>=20 >> I don't think that pte_dirty() check was for the pagecache code. For = any fs >> that has page_mkwrite() defined, it'll already have = vma_wants_writenotify() >> return 1, so we'll never try to add write bit, hence we'll never even = try >> to check pte_dirty(). >=20 > I might be too tired, but the whole reason we had this magic before my > commit in place was only for the pagecache. >=20 > With vma_wants_writenotify()=3D0 you can directly map the pages = writable > and don't have to do these advanced checks here. In a writable > MAP_SHARED VMA you'll already have pte_write(). >=20 > We only get !pte_write() in case we have vma_wants_writenotify()=3D1 = ... >=20 > try_change_writable =3D vma_wants_writenotify(vma, = vma->vm_page_prot); >=20 > and that's the code that checked the dirty bit after all to decide -- > amongst other things -- if we can simply map it writable without going > via the write fault handler and triggering do_shared_fault() . >=20 > See crazy/ugly FOLL_FORCE code in GUP that similarly checks the dirty = bit. I thought you want to get rid of it at least for anonymous pages. No? >=20 > But yeah, it's all confusing so I might just be wrong regarding > pagecache pages. Just to note: I am not very courageous and I did not intend to change condition for when non-anonymous pages are set as writable. That=E2=80=99s= the reason I did not change the dirty for non-writable non-anonymous entries = (as Peter said). And that=E2=80=99s the reason that setting the dirty bit = (at least as I should have done it) is only performed after we made the decision on the write-bit. IOW, after you made your decision about the write-bit, then and only = then you may be able to set the dirty bit for writable entries. Since the = entry is already writeable (i.e., can be written without a fault later = directly from userspace), there should be no concern of correctness when you set = it.