From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 35956C4332F for ; Thu, 8 Dec 2022 16:29:08 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C569C8E0003; Thu, 8 Dec 2022 11:29:07 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id C06A28E0001; Thu, 8 Dec 2022 11:29:07 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id ACF208E0003; Thu, 8 Dec 2022 11:29:07 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 9E4C48E0001 for ; Thu, 8 Dec 2022 11:29:07 -0500 (EST) Received: from smtpin06.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 8337440941 for ; Thu, 8 Dec 2022 16:29:07 +0000 (UTC) X-FDA: 80219673534.06.88211B3 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf01.hostedemail.com (Postfix) with ESMTP id 93B334000A for ; Thu, 8 Dec 2022 16:29:04 +0000 (UTC) Authentication-Results: imf01.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=B2VWUQfe; spf=pass (imf01.hostedemail.com: domain of peterx@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1670516945; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=pm9mT9pO7WE/9vLFhiA7g8rknybeYoY/tVPaAx1JEgs=; b=E7m/jgP2IEcEK4mNkMjIspNUB3jGlS5p3vtvpLFuI/t7IjASzZJBFeIOt5MROrmR4VjM8f gNSlId8WTmRvEUlG2t/PgExlv/FW4UdTvfNW3HdmKTidMexfzwHadd9tAd/S2w7yk1GNOK jtorUPSMBW9Wfr5E1bqsA4Tw0CkamKg= ARC-Authentication-Results: i=1; imf01.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=B2VWUQfe; spf=pass (imf01.hostedemail.com: domain of peterx@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1670516945; a=rsa-sha256; cv=none; b=TjW4twz+89m+HRrjQwYZMroLj/k6yZcu5umh47lBh/X1HVtkxAyz92pT7t8a2bMdcL2itr tPNHkbLWdc4oghLb31ag213rNo+q5WWC1e3LSkEqJN/Qz7BQI70d2VFAr4eW7tywb0gFCj wqYJ7u1z3aTJ0zHW4rXZkS5GXvQuF3k= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1670516943; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=pm9mT9pO7WE/9vLFhiA7g8rknybeYoY/tVPaAx1JEgs=; b=B2VWUQfe/kUOAGDbUqwsvmI+jehJZ0UO49Equt03j9X9COiMRDOvvAsvAfRx0IBeLNiWTq qraEIWbE2+1O0raMK2PfjOPDgy9zjXL5xDDrx8EvzVYu3oklcxF0zECpAmMt42E5mblq/x XgnlGJL6XX8FWvHbsS8qIDiFhdwcHx0= Received: from mail-qk1-f197.google.com (mail-qk1-f197.google.com [209.85.222.197]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-631-QgTDRiRyMNiLKA59ZQigSA-1; Thu, 08 Dec 2022 11:29:03 -0500 X-MC-Unique: QgTDRiRyMNiLKA59ZQigSA-1 Received: by mail-qk1-f197.google.com with SMTP id bl21-20020a05620a1a9500b006fa35db066aso1919648qkb.19 for ; Thu, 08 Dec 2022 08:29:02 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=pm9mT9pO7WE/9vLFhiA7g8rknybeYoY/tVPaAx1JEgs=; b=jrlXMp/z+BUCAoDaa+dlREu6r/53mJmmzXUva69/BHMpFpxJ4GXVm++JaEDvm3Zeeb t3orNe4HDERJzezUlkVpGQCDSblZGAkwd/RGl5+ftoYk68cdwc9X+f0u6zm1u1TloGiJ rN86rUmsfG1regEA1i20EINQEJXdmHgmcNmXSvItNrW1BB0rjZ96zGCfG1aWccHjKYVc 0H13Mc4UjuQe7MbmIjTxDrVN9CXo9TJxmZ9ILDZH/QfOXdNcbTsV0qHmzAToJ+4EvHZ5 HSEio3GHx3sDaNFCcC5vIxeup1ZBhaZgOd82PR7ktl16goTsHY1AA+0Ke+w7CSlvkCAh +94Q== X-Gm-Message-State: ANoB5pmoX1Ta371KBYJVo4kPxIBBnTZ4FbNejb6/4dohZFfPtiKYed2Q lOJkjOFbuwD5K+HCTrMOEkVMftGoX81PFrkyXSKHt4uVtqwxP+YnxYwarC1mC3a4nnPQzzbVbph yzD3SVcvkrFM= X-Received: by 2002:ac8:5e91:0:b0:3a6:6f8c:5d78 with SMTP id r17-20020ac85e91000000b003a66f8c5d78mr3294382qtx.68.1670516942136; Thu, 08 Dec 2022 08:29:02 -0800 (PST) X-Google-Smtp-Source: AA0mqf7o0hEkxrts0nz45XYRAEI+ozjuWLKdORi42BIHdkcvHIUFj0SSffFLmQOUHMsjqI/UoDf+Ww== X-Received: by 2002:ac8:5e91:0:b0:3a6:6f8c:5d78 with SMTP id r17-20020ac85e91000000b003a66f8c5d78mr3294353qtx.68.1670516941837; Thu, 08 Dec 2022 08:29:01 -0800 (PST) Received: from x1n (bras-base-aurron9127w-grc-46-70-31-27-79.dsl.bell.ca. [70.31.27.79]) by smtp.gmail.com with ESMTPSA id j11-20020ac8550b000000b003434d3b5938sm15597403qtq.2.2022.12.08.08.29.00 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 08 Dec 2022 08:29:01 -0800 (PST) Date: Thu, 8 Dec 2022 11:29:00 -0500 From: Peter Xu To: David Hildenbrand Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, Ives van Hoorne , stable@vger.kernel.org, Andrew Morton , Hugh Dickins , Alistair Popple , Mike Rapoport , Nadav Amit , Andrea Arcangeli Subject: Re: [PATCH v1] mm/userfaultfd: enable writenotify while userfaultfd-wp is enabled for a VMA Message-ID: References: <20221208114137.35035-1-david@redhat.com> MIME-Version: 1.0 In-Reply-To: <20221208114137.35035-1-david@redhat.com> X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8 Content-Disposition: inline X-Rspam-User: X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 93B334000A X-Stat-Signature: fxnnfp3pqtrnqbbskxwbtnuqd1izi6g7 X-HE-Tag: 1670516944-899223 X-HE-Meta: U2FsdGVkX1/dY8PnNnmCEmW1zuKrhxyP3G85XrXCHJdSsxxIgguVMyVsh3Rk0UNDVEulDXsDFPbQoaAG9O5Lgii67gu50JDDQf6w4IM4fqNaPGkHnjjdBEoN4f/N1kQyYSEsvBXPqYgLAIUY+uY2TQOMOp4tzCfqnNpFxbUi1QoI62Jvt1cLdCEEJHKyNYXQIUBtw4tRX0fcvEFL9QssrhakBZuOEsCNObmFrOYNGQJTR3Pfz2QV3JEedRmn3+SadxfqpyAM/Pj5BxPZN0NMlOY7PuJZZgWC/VLUwAcgCokPBRuaUVkloPzbPBE7rKuxJZMMSMSKSa/lRXdu+pmMfM+OGuUEsv5IqfjHOhVPLYA/4cvG1KF2Jth4eUULPAcPI6Smgu1EYHx37HjuI+jxA/ymgtTIWcCNXdjEGGZwMqb/7I3Zoy9BHjHi6dcsBfzw+wbkbUh1+5NSpIdkLyVbTVMOfe49aKVTtm+4uXlHPjN6Vat+G6YgYYlAa1z1C61pzlu/N1ZyYDdn4TOkoYADdykZp0jen4qkFVEG/dC3ZalUeiF2clOdPhBs+vmGXlFGNmhteH3U9am5PdScRHfR6eO31VcpGO0MAo6sGuBVyT4rr6Z/BCGru3599U8sbOaohAGa4n/tOYWGp+DdA3UtLrTZ2ZR7wm2pxtul4hlCDOdJ76SnH2AnGyfO4GMu+S0kQjCemjKKl9ntJWTiOZ4bwQ/Y38mD6zpM/Zq6YRldaXwy1/KSWPg+RMMj23Job+lrwGZKPdmhzqom1esFyrR1npGUScsBVEiLX4sLZeMowR186ftA5Avqsiy5Mv6f8VoFfDdUm9BUn38javouCzDT43nsoujjB8w8q2T0jH0O2/Wi9KW4Xl2Yh2H3aisKSS4wsUAEifl+XpTXbVn5Z1QYFE3kXBIBOYi9fuYDdim7Az/tRPU3tFiGMxRSDmGfOV7HCVvwroQ+KahCHWiEI+X NCZRKSPR JlzafhUs2FhuVfgZzbRiLIbolM3UTi7AkOfh/AW+m3AScVhL8VT8V5CyixPKFoDN74XBLSSekV1IW5YG8bynRJzhgDKx55O3BY5dSiTxc9K3ZS1qnOLXuj7mHW6sQBOt4IFwqHx9vsNc8f3hxffkmueuJu1kU++eIoxqE4+K8hYgcxeLkkJNGQlAIbA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, Dec 08, 2022 at 12:41:37PM +0100, David Hildenbrand wrote: > Currently, we don't enable writenotify when enabling userfaultfd-wp on > a shared writable mapping (for now only shmem and hugetlb). The consequence > is that vma->vm_page_prot will still include write permissions, to be set > as default for all PTEs that get remapped (e.g., mprotect(), NUMA hinting, > page migration, ...). > > So far, vma->vm_page_prot is assumed to be a safe default, meaning that > we only add permissions (e.g., mkwrite) but not remove permissions (e.g., > wrprotect). For example, when enabling softdirty tracking, we enable > writenotify. With uffd-wp on shared mappings, that changed. More details > on vma->vm_page_prot semantics were summarized in [1]. > > This is problematic for uffd-wp: we'd have to manually check for > a uffd-wp PTEs/PMDs and manually write-protect PTEs/PMDs, which is error > prone. Prone to such issues is any code that uses vma->vm_page_prot to set > PTE permissions: primarily pte_modify() and mk_pte(). > > Instead, let's enable writenotify such that PTEs/PMDs/... will be mapped > write-protected as default and we will only allow selected PTEs that are > definitely safe to be mapped without write-protection (see > can_change_pte_writable()) to be writable. In the future, we might want > to enable write-bit recovery -- e.g., can_change_pte_writable() -- at > more locations, for example, also when removing uffd-wp protection. > > This fixes two known cases: > > (a) remove_migration_pte() mapping uffd-wp'ed PTEs writable, resulting > in uffd-wp not triggering on write access. > (b) do_numa_page() / do_huge_pmd_numa_page() mapping uffd-wp'ed PTEs/PMDs > writable, resulting in uffd-wp not triggering on write access. > > Note that do_numa_page() / do_huge_pmd_numa_page() can be reached even > without NUMA hinting (which currently doesn't seem to be applicable to > shmem), for example, by using uffd-wp with a PROT_WRITE shmem VMA. > On such a VMA, userfaultfd-wp is currently non-functional. > > Note that when enabling userfaultfd-wp, there is no need to walk page > tables to enforce the new default protection for the PTEs: we know that > they cannot be uffd-wp'ed yet, because that can only happen after > enabling uffd-wp for the VMA in general. > > Also note that this makes mprotect() on ranges with uffd-wp'ed PTEs not > accidentally set the write bit -- which would result in uffd-wp not > triggering on later write access. This commit makes uffd-wp on shmem behave > just like uffd-wp on anonymous memory (iow, less special) in that regard, > even though, mixing mprotect with uffd-wp is controversial. > > [1] https://lkml.kernel.org/r/92173bad-caa3-6b43-9d1e-9a471fdbc184@redhat.com > > Reported-by: Ives van Hoorne > Debugged-by: Peter Xu > Fixes: b1f9e876862d ("mm/uffd: enable write protection for shmem & hugetlbfs") > Cc: stable@vger.kernel.org > Cc: Andrew Morton > Cc: Hugh Dickins > Cc: Alistair Popple > Cc: Mike Rapoport > Cc: Nadav Amit > Cc: Andrea Arcangeli > Signed-off-by: David Hildenbrand Acked-by: Peter Xu One trivial nit. > --- > > As discussed in [2], this is supposed to replace the fix by Peter: > [PATCH v3 1/2] mm/migrate: Fix read-only page got writable when recover > pte > > This survives vm/selftests and my reproducers: > * migrating pages that are uffd-wp'ed using mbind() on a machine with 2 > NUMA nodes > * Using a PROT_WRITE mapping with uffd-wp > * Using a PROT_READ|PROT_WRITE mapping with uffd-wp'ed pages and > mprotect()'ing it PROT_WRITE > * Using a PROT_READ|PROT_WRITE mapping with uffd-wp'ed pages and > temporarily mprotect()'ing it PROT_READ > > uffd-wp properly triggers in all cases. On v8.1-rc8, all mre reproducers > fail. > > It would be good to get some more testing feedback and review. > > [2] https://lkml.kernel.org/r/20221202122748.113774-1-david@redhat.com > > --- > fs/userfaultfd.c | 28 ++++++++++++++++++++++------ > mm/mmap.c | 4 ++++ > 2 files changed, 26 insertions(+), 6 deletions(-) > > diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c > index 98ac37e34e3d..fb0733f2e623 100644 > --- a/fs/userfaultfd.c > +++ b/fs/userfaultfd.c > @@ -108,6 +108,21 @@ static bool userfaultfd_is_initialized(struct userfaultfd_ctx *ctx) > return ctx->features & UFFD_FEATURE_INITIALIZED; > } > > +static void userfaultfd_set_vm_flags(struct vm_area_struct *vma, > + vm_flags_t flags) > +{ > + const bool uffd_wp = !!((vma->vm_flags | flags) & VM_UFFD_WP); IIUC this can be "uffd_wp_changed" then switch "|" to "^". But not a hot path at all, so shouldn't matter a lot. Thanks, > + > + vma->vm_flags = flags; > + /* > + * For shared mappings, we want to enable writenotify while > + * userfaultfd-wp is enabled (see vma_wants_writenotify()). We'll simply > + * recalculate vma->vm_page_prot whenever userfaultfd-wp is involved. > + */ > + if ((vma->vm_flags & VM_SHARED) && uffd_wp) > + vma_set_page_prot(vma); > +} -- Peter Xu