From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A68BEC4332F for ; Thu, 8 Dec 2022 11:45:35 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 29F258E0003; Thu, 8 Dec 2022 06:45:35 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 24EED8E0001; Thu, 8 Dec 2022 06:45:35 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0F0788E0003; Thu, 8 Dec 2022 06:45:35 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 0064F8E0001 for ; Thu, 8 Dec 2022 06:45:34 -0500 (EST) Received: from smtpin15.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id A66E5C0C62 for ; Thu, 8 Dec 2022 11:45:34 +0000 (UTC) X-FDA: 80218958988.15.43EF984 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf01.hostedemail.com (Postfix) with ESMTP id 9CBC640008 for ; Thu, 8 Dec 2022 11:45:32 +0000 (UTC) Authentication-Results: imf01.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=CuGQD1ux; spf=pass (imf01.hostedemail.com: domain of david@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1670499932; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=WvyhlB+TePEV+nt+nsHgaA3oBgy2T7adymKuPtzn+p0=; b=DNZ91In0zfJRtV3BqsthHRiBwsk5lyKoJohk2/NsJkK9pnZMKPWKgTMeyFksQlCTivfsfi wa4RspAA9AP3aLPfAVrDfAn9PhubUTdXyO7gokIW9ZWb/Ji7dmg86Ba5gEqR7m8EO52bNe GbgN055y9fQZ1VA+Hl68eK51rhRADL0= ARC-Authentication-Results: i=1; imf01.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=CuGQD1ux; spf=pass (imf01.hostedemail.com: domain of david@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1670499932; a=rsa-sha256; cv=none; b=OXAXVQ8XPi05K2kji59nMS9i+r2HIqjbMYRejR7P2QUTJWgM2z/iAQunku1R08zqNI4v4m QMgF61+ZuICJl15f9+1/2gFpxO2znjxoaf86I6+AJ4Bw3N1L25wBll5D/L1xOXsGWOcpEf jrqZ2n3TgU+9MLK08UliuIL3TFvckXo= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1670499932; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=WvyhlB+TePEV+nt+nsHgaA3oBgy2T7adymKuPtzn+p0=; b=CuGQD1uxyfDNUNdVrAp5yGFAwvGCV2tOqc2xptcSJ/KgH9/VrwEiiLJ6kwX/pPQ91BYOZq Fr+lYa7N7jXSR2y4Qaw0kWscyaTmFFDFC04oazkWHbVUl6gySKtVS69WfKBu3OVB+wkL77 y9m/EQmALsv6/DUNIftkVh3XGLvzH9o= Received: from mail-wm1-f71.google.com (mail-wm1-f71.google.com [209.85.128.71]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-311-4rnrz2IWOG-lrUemsIQblQ-1; Thu, 08 Dec 2022 06:45:28 -0500 X-MC-Unique: 4rnrz2IWOG-lrUemsIQblQ-1 Received: by mail-wm1-f71.google.com with SMTP id v188-20020a1cacc5000000b003cf76c4ae66so2294834wme.7 for ; Thu, 08 Dec 2022 03:45:28 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:in-reply-to:organization:from:references :cc:to:content-language:subject:user-agent:mime-version:date :message-id:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=WvyhlB+TePEV+nt+nsHgaA3oBgy2T7adymKuPtzn+p0=; b=HNgOiyCoSs3TqITMyQVM9ARccSJXS2dPRT3Dgoy8ll12LIW+uTludthCJU5lvxck1M ILxOwlrva5FamZahYH9q5AhA1RSKs/riTcrEvdw/LJk+mWoUPo3cAZuuKImjBCxAqSlX 9u8zf+BJ5KB31jSkxjiVYDr8vd9jJ/0y7gzicKoEoUkaMIsNPxKsq6dqEt6CmM60dJ84 WdamyUoeZONOSt6Vi2u8FHqg+Fo3ak+FdVN4od/DjlmDi1Qi20wSVIJ438grMbQhh/ic 2IPCB1FZrnHBOCovR7YrRADxyl9ZEU+D38B//8HF482oSrYWDUkW9OzbmlvQPJVf4d5Z huQQ== X-Gm-Message-State: ANoB5pnamLjNUGmTWqmWqOIh/RUQmSGPpGYniRFa1bkgbcEL0H7oiK2d 8kJS9pE94TLGABnkb+XhnKbuaBzhmMRSr+JjQxndZHD3E2yRrKcIn8GqjKQ/wCZbDbNkwUu8lEC KGgFtB3vS2oo= X-Received: by 2002:a05:600c:554b:b0:3d0:88b4:9cda with SMTP id iz11-20020a05600c554b00b003d088b49cdamr18055680wmb.114.1670499927707; Thu, 08 Dec 2022 03:45:27 -0800 (PST) X-Google-Smtp-Source: AA0mqf5RtKn3V37BX8nYnuLESqHtB60503vSNmHoc+41TTHuFySKR2d9sU96hNBiyeYeWorvuqSGTg== X-Received: by 2002:a05:600c:554b:b0:3d0:88b4:9cda with SMTP id iz11-20020a05600c554b00b003d088b49cdamr18055665wmb.114.1670499927359; Thu, 08 Dec 2022 03:45:27 -0800 (PST) Received: from ?IPV6:2a09:80c0:192:0:5dac:bf3d:c41:c3e7? ([2a09:80c0:192:0:5dac:bf3d:c41:c3e7]) by smtp.gmail.com with ESMTPSA id k15-20020a5d66cf000000b00228d52b935asm21570694wrw.71.2022.12.08.03.45.26 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 08 Dec 2022 03:45:26 -0800 (PST) Message-ID: <83259ac7-1aa4-e186-43d9-2b280795e510@redhat.com> Date: Thu, 8 Dec 2022 12:45:26 +0100 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.5.0 Subject: Re: [PATCH v1] mm/userfaultfd: enable writenotify while userfaultfd-wp is enabled for a VMA To: linux-kernel@vger.kernel.org Cc: linux-mm@kvack.org, Ives van Hoorne , Peter Xu , stable@vger.kernel.org, Andrew Morton , Hugh Dickins , Alistair Popple , Mike Rapoport , Nadav Amit , Andrea Arcangeli References: <20221208114137.35035-1-david@redhat.com> From: David Hildenbrand Organization: Red Hat In-Reply-To: <20221208114137.35035-1-david@redhat.com> X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Stat-Signature: i448gw3t5fe7owt7jxonaum8uz93r9z1 X-Rspam-User: X-Rspamd-Queue-Id: 9CBC640008 X-Rspamd-Server: rspam06 X-HE-Tag: 1670499932-755183 X-HE-Meta: U2FsdGVkX19MPi8huWv0IJTKibT0Zb3cKm5jyZ19W0Xb1lXToerfojqlW268DemBIBAiDKOkJWm85FsWT4OJQ0SQcBqYjo10D/OUwRoxuD5/MIKmGIXOyfem4JQYtlzt/ghh7xErxUbby7MV1cQA0lcmFiWxc3zLEJMggecXyNe1ECBaOsSSlMvdmeX9fNFvje69SGu3B6KDAV1T53esQSxb0s5rRZQ1+g+MrXwqO2HoXQXlqTCKZSRoogfKqpkljxv++U1RvuHCcxe4a+wsQ/vBbh43uFP8DPIZgSkqB4ZpjsNmfGT9OmGV9b8qKu9YWYy1pv9mYjvlt6PsxN6ELwDlt5zzm0gtNEaX4+2rB+pYJJta4NsDjtBZ4+2NYby02RfkiIpEo3JXEin1+X+IhCQK/cXDMDs535YLCT46jOCSBl0Mo4eEBZ4AhmlZNGVG2FjbIDSl+Orn2SZmfkJ8D6GLlP5TFfLkNHoDfUF1QvKbXoYFIecUOgMqF/hvssBfHU1uRhoCXq7rPGQuRdc3FCz+uhhuRddwSnzQb3ufE5t0y8Mn/imrrSa9wQaJHMO/2fu8GefR6kt0u9zHtJ4hfo7edxCy+toBx32JupgkhKuXdYCVMcF5BCwoIzF1PClnH6aRLZt4ZrX4IBR63xkyp0pcq9hJ4bUYW6qFGYiy2SVYn5sw5UJjjxCG/YkJ3ZlehHqKURPW5pBKvhgD7Jsf/g+rXLe8zbi9JZ75AmwgWKGOAfoGd+WWuzy1dPd4Ed+Skhd30w3AkBWhQeIhlKji3ONmYYiZJOcv5LsdB5hqnkunfkmBKIEOJ4LOvJS1ORHIlp882R5GHxNE67KaxU+7lkUgOgXPE2DXlldDHohh35amvFK92o8njyHFVXOyHuIiKE+OQbT82kGcPLfmpBCGeRgFjpHYnIO7d/kAVD+ILWFBaDr/1Exc6067LUC3YhOKSgPUnnqEJpKDMgiCxyb cwXRSk5d Y5qfegmeNqSRMw4B6n449A17w5seOidVmwggTuEtVBMaaGMMX/VovlH8txZG9r690sSz7ih7IeNCiH9/QtoUtXifnMRg+/A5cbGe1f0MSkDZOKnvtI0FpEQSKzvfB0gsxKa+ssYZDbJseFzcVysTMQfOcLPVEwgdaqzr79mxjhREE5AzTNukStmzzwl7ZQ7l5zjxcjv76/oazXiWIrtmomln+uQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 08.12.22 12:41, David Hildenbrand wrote: > Currently, we don't enable writenotify when enabling userfaultfd-wp on > a shared writable mapping (for now only shmem and hugetlb). The consequence > is that vma->vm_page_prot will still include write permissions, to be set > as default for all PTEs that get remapped (e.g., mprotect(), NUMA hinting, > page migration, ...). > > So far, vma->vm_page_prot is assumed to be a safe default, meaning that > we only add permissions (e.g., mkwrite) but not remove permissions (e.g., > wrprotect). For example, when enabling softdirty tracking, we enable > writenotify. With uffd-wp on shared mappings, that changed. More details > on vma->vm_page_prot semantics were summarized in [1]. > > This is problematic for uffd-wp: we'd have to manually check for > a uffd-wp PTEs/PMDs and manually write-protect PTEs/PMDs, which is error > prone. Prone to such issues is any code that uses vma->vm_page_prot to set > PTE permissions: primarily pte_modify() and mk_pte(). > > Instead, let's enable writenotify such that PTEs/PMDs/... will be mapped > write-protected as default and we will only allow selected PTEs that are > definitely safe to be mapped without write-protection (see > can_change_pte_writable()) to be writable. In the future, we might want > to enable write-bit recovery -- e.g., can_change_pte_writable() -- at > more locations, for example, also when removing uffd-wp protection. > > This fixes two known cases: > > (a) remove_migration_pte() mapping uffd-wp'ed PTEs writable, resulting > in uffd-wp not triggering on write access. > (b) do_numa_page() / do_huge_pmd_numa_page() mapping uffd-wp'ed PTEs/PMDs > writable, resulting in uffd-wp not triggering on write access. > > Note that do_numa_page() / do_huge_pmd_numa_page() can be reached even > without NUMA hinting (which currently doesn't seem to be applicable to > shmem), for example, by using uffd-wp with a PROT_WRITE shmem VMA. > On such a VMA, userfaultfd-wp is currently non-functional. > > Note that when enabling userfaultfd-wp, there is no need to walk page > tables to enforce the new default protection for the PTEs: we know that > they cannot be uffd-wp'ed yet, because that can only happen after > enabling uffd-wp for the VMA in general. > > Also note that this makes mprotect() on ranges with uffd-wp'ed PTEs not > accidentally set the write bit -- which would result in uffd-wp not > triggering on later write access. This commit makes uffd-wp on shmem behave > just like uffd-wp on anonymous memory (iow, less special) in that regard, > even though, mixing mprotect with uffd-wp is controversial. > > [1] https://lkml.kernel.org/r/92173bad-caa3-6b43-9d1e-9a471fdbc184@redhat.com > > Reported-by: Ives van Hoorne > Debugged-by: Peter Xu > Fixes: b1f9e876862d ("mm/uffd: enable write protection for shmem & hugetlbfs") > Cc: stable@vger.kernel.org > Cc: Andrew Morton > Cc: Hugh Dickins No idea how a wrong mail address from Hugh sneaked in 2 (I assume, copy-paste issue from de1ccfb64824). Let's properly cc him and keep the full patch. > Cc: Alistair Popple > Cc: Mike Rapoport > Cc: Nadav Amit > Cc: Andrea Arcangeli > Signed-off-by: David Hildenbrand > --- > > As discussed in [2], this is supposed to replace the fix by Peter: > [PATCH v3 1/2] mm/migrate: Fix read-only page got writable when recover > pte > > This survives vm/selftests and my reproducers: > * migrating pages that are uffd-wp'ed using mbind() on a machine with 2 > NUMA nodes > * Using a PROT_WRITE mapping with uffd-wp > * Using a PROT_READ|PROT_WRITE mapping with uffd-wp'ed pages and > mprotect()'ing it PROT_WRITE > * Using a PROT_READ|PROT_WRITE mapping with uffd-wp'ed pages and > temporarily mprotect()'ing it PROT_READ > > uffd-wp properly triggers in all cases. On v8.1-rc8, all mre reproducers > fail. > > It would be good to get some more testing feedback and review. > > [2] https://lkml.kernel.org/r/20221202122748.113774-1-david@redhat.com > > --- > fs/userfaultfd.c | 28 ++++++++++++++++++++++------ > mm/mmap.c | 4 ++++ > 2 files changed, 26 insertions(+), 6 deletions(-) > > diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c > index 98ac37e34e3d..fb0733f2e623 100644 > --- a/fs/userfaultfd.c > +++ b/fs/userfaultfd.c > @@ -108,6 +108,21 @@ static bool userfaultfd_is_initialized(struct userfaultfd_ctx *ctx) > return ctx->features & UFFD_FEATURE_INITIALIZED; > } > > +static void userfaultfd_set_vm_flags(struct vm_area_struct *vma, > + vm_flags_t flags) > +{ > + const bool uffd_wp = !!((vma->vm_flags | flags) & VM_UFFD_WP); > + > + vma->vm_flags = flags; > + /* > + * For shared mappings, we want to enable writenotify while > + * userfaultfd-wp is enabled (see vma_wants_writenotify()). We'll simply > + * recalculate vma->vm_page_prot whenever userfaultfd-wp is involved. > + */ > + if ((vma->vm_flags & VM_SHARED) && uffd_wp) > + vma_set_page_prot(vma); > +} > + > static int userfaultfd_wake_function(wait_queue_entry_t *wq, unsigned mode, > int wake_flags, void *key) > { > @@ -618,7 +633,8 @@ static void userfaultfd_event_wait_completion(struct userfaultfd_ctx *ctx, > for_each_vma(vmi, vma) { > if (vma->vm_userfaultfd_ctx.ctx == release_new_ctx) { > vma->vm_userfaultfd_ctx = NULL_VM_UFFD_CTX; > - vma->vm_flags &= ~__VM_UFFD_FLAGS; > + userfaultfd_set_vm_flags(vma, > + vma->vm_flags & ~__VM_UFFD_FLAGS); > } > } > mmap_write_unlock(mm); > @@ -652,7 +668,7 @@ int dup_userfaultfd(struct vm_area_struct *vma, struct list_head *fcs) > octx = vma->vm_userfaultfd_ctx.ctx; > if (!octx || !(octx->features & UFFD_FEATURE_EVENT_FORK)) { > vma->vm_userfaultfd_ctx = NULL_VM_UFFD_CTX; > - vma->vm_flags &= ~__VM_UFFD_FLAGS; > + userfaultfd_set_vm_flags(vma, vma->vm_flags & ~__VM_UFFD_FLAGS); > return 0; > } > > @@ -733,7 +749,7 @@ void mremap_userfaultfd_prep(struct vm_area_struct *vma, > } else { > /* Drop uffd context if remap feature not enabled */ > vma->vm_userfaultfd_ctx = NULL_VM_UFFD_CTX; > - vma->vm_flags &= ~__VM_UFFD_FLAGS; > + userfaultfd_set_vm_flags(vma, vma->vm_flags & ~__VM_UFFD_FLAGS); > } > } > > @@ -895,7 +911,7 @@ static int userfaultfd_release(struct inode *inode, struct file *file) > prev = vma; > } > > - vma->vm_flags = new_flags; > + userfaultfd_set_vm_flags(vma, new_flags); > vma->vm_userfaultfd_ctx = NULL_VM_UFFD_CTX; > } > mmap_write_unlock(mm); > @@ -1463,7 +1479,7 @@ static int userfaultfd_register(struct userfaultfd_ctx *ctx, > * the next vma was merged into the current one and > * the current one has not been updated yet. > */ > - vma->vm_flags = new_flags; > + userfaultfd_set_vm_flags(vma, new_flags); > vma->vm_userfaultfd_ctx.ctx = ctx; > > if (is_vm_hugetlb_page(vma) && uffd_disable_huge_pmd_share(vma)) > @@ -1651,7 +1667,7 @@ static int userfaultfd_unregister(struct userfaultfd_ctx *ctx, > * the next vma was merged into the current one and > * the current one has not been updated yet. > */ > - vma->vm_flags = new_flags; > + userfaultfd_set_vm_flags(vma, new_flags); > vma->vm_userfaultfd_ctx = NULL_VM_UFFD_CTX; > > skip: > diff --git a/mm/mmap.c b/mm/mmap.c > index a5eb2f175da0..6033d20198b0 100644 > --- a/mm/mmap.c > +++ b/mm/mmap.c > @@ -1525,6 +1525,10 @@ int vma_wants_writenotify(struct vm_area_struct *vma, pgprot_t vm_page_prot) > if (vma_soft_dirty_enabled(vma) && !is_vm_hugetlb_page(vma)) > return 1; > > + /* Do we need write faults for uffd-wp tracking? */ > + if (userfaultfd_wp(vma)) > + return 1; > + > /* Specialty mapping? */ > if (vm_flags & VM_PFNMAP) > return 0; > > base-commit: 8ed710da2873c2aeb3bb805864a699affaf1d03b -- Thanks, David / dhildenb