From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id CF5BBC43460 for ; Wed, 28 Apr 2021 22:51:22 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 554256143C for ; Wed, 28 Apr 2021 22:51:22 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 554256143C Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 8FF0E6B0071; Wed, 28 Apr 2021 18:51:21 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 8DA3B6B0072; Wed, 28 Apr 2021 18:51:21 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6DC766B0073; Wed, 28 Apr 2021 18:51:21 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0234.hostedemail.com [216.40.44.234]) by kanga.kvack.org (Postfix) with ESMTP id 3BD756B0071 for ; Wed, 28 Apr 2021 18:51:21 -0400 (EDT) Received: from smtpin01.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id E4A70181AEF21 for ; Wed, 28 Apr 2021 22:51:20 +0000 (UTC) X-FDA: 78083273520.01.8B36A0A Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [216.205.24.124]) by imf07.hostedemail.com (Postfix) with ESMTP id CA2B4A0009C5 for ; Wed, 28 Apr 2021 22:51:19 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1619650279; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=OG28X1p0LN5qmWCjh+VmHMXYqZhJcXnpmOsWEKY5ICY=; b=C+g0sPuPUKOb1XkL5goYl1EiC+MAkHxsxvfG+JYQDjPKYQNk5lv1BHsna3l+EJSNaeR6A3 RtnpvetPDRw96W+n3Sx0CU5ZOOAeIEemaUcnCjdSmxpxXQWWIe8hnhxJ+YD3JOmY912eQG sLHxMLSbgu8vnh9fVYVSLyPr3C+k2MM= Received: from mail-qv1-f72.google.com (mail-qv1-f72.google.com [209.85.219.72]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-177-v-Mo3GDiMuGqng6EpG-yiw-1; Wed, 28 Apr 2021 18:51:18 -0400 X-MC-Unique: v-Mo3GDiMuGqng6EpG-yiw-1 Received: by mail-qv1-f72.google.com with SMTP id x15-20020a0ce0cf0000b029019cb3e75c62so29307973qvk.15 for ; Wed, 28 Apr 2021 15:51:18 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=OG28X1p0LN5qmWCjh+VmHMXYqZhJcXnpmOsWEKY5ICY=; b=EPY0dxFNmpzwHUKvb+dx5ddRR9RpvNddDAUHRAG+o2C/f58hSlTGZ9QRPS+6E5Coeg YU3jp1pCJjZfW3RM4QD6GA0BajNjZ+SaLFvsTojlC8EU03n6waIEg0HjvrRmVTXJX5Gs IsT9sKcYPTxjxjV95Zt0AcfT/MVlIZRokDxtJ1iNnG8weXl0luNPLIAU2mT03xT2ldqP 2nabDTVvCxSr2lhk9XP687e0gbNSS1nFaIgwelp9UJkj0Jf3XbJFB0HddKnxkwOAJxDo wnr20Np6VfMaIvkrqrKB/B6ZycsOQp3R+1mHWlwSOr7zDe73WLjWLtdTys5uD+3abt22 fqUA== X-Gm-Message-State: AOAM531+HIA/DJanQTQnKvG6FVMTDEKFWivaUmFymykivCRrGfL9ecji SddN5Ph3zU0DtLgpOGaO83uDXBOdtHjc6ca5FGV90Bi16S3kYixQOukzW6jRIz8OmPTDFDOeLbQ KZ9fwyUeCroTzbHs2qtr1cFOhoiZIPL2CqvqF/SIk1zr6IogmlmjPNElVg/kV X-Received: by 2002:a37:8ec4:: with SMTP id q187mr31300675qkd.381.1619650276894; Wed, 28 Apr 2021 15:51:16 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyi99fErJAPLQ1t4NVfvD0N51OiO720CW6bMsxTi+DZzEyVg/1oxclKoH8kWI9KuoB0CouZNQ== X-Received: by 2002:a37:8ec4:: with SMTP id q187mr31300646qkd.381.1619650276529; Wed, 28 Apr 2021 15:51:16 -0700 (PDT) Received: from localhost.localdomain (bras-base-toroon474qw-grc-77-184-145-104-227.dsl.bell.ca. [184.145.104.227]) by smtp.gmail.com with ESMTPSA id 191sm897459qkk.31.2021.04.28.15.51.14 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 28 Apr 2021 15:51:16 -0700 (PDT) From: Peter Xu To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Mike Kravetz , peterx@redhat.com, Mike Rapoport , Axel Rasmussen , Andrea Arcangeli , Hugh Dickins , "Kirill A . Shutemov" , Andrew Morton , Jerome Glisse Subject: [PATCH 2/6] mm/userfaultfd: Fix uffd-wp special cases for fork() Date: Wed, 28 Apr 2021 18:50:26 -0400 Message-Id: <20210428225030.9708-3-peterx@redhat.com> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20210428225030.9708-1-peterx@redhat.com> References: <20210428225030.9708-1-peterx@redhat.com> MIME-Version: 1.0 Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=peterx@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset="US-ASCII" X-Stat-Signature: angneho6axbgue6ei3u98kaj3934rmik X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: CA2B4A0009C5 Received-SPF: none (redhat.com>: No applicable sender policy available) receiver=imf07; identity=mailfrom; envelope-from=""; helo=us-smtp-delivery-124.mimecast.com; client-ip=216.205.24.124 X-HE-DKIM-Result: pass/pass X-HE-Tag: 1619650279-78047 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: We tried to do something similar in b569a1760782 ("userfaultfd: wp: drop _PAGE_UFFD_WP properly when fork") previously, but it's not doing it all right.. A few fixes around the code path: 1. We were referencing VM_UFFD_WP vm_flags on the _old_ vma rather than= the new vma. That's overlooked in b569a1760782, so it won't work as exp= ected. Thanks to the recent rework on fork code (7a4830c380f3a8b3), we can = easily get the new vma now, so switch the checks to that. 2. Dropping the uffd-wp bit in copy_huge_pmd() could be wrong if the hu= ge pmd is a migration huge pmd. When it happens, instead of using pmd_uffd= _wp(), we should use pmd_swp_uffd_wp(). The fix is simply to handle them se= parately. 3. Forget to carry over uffd-wp bit for a write migration huge pmd entr= y. This also happens in copy_huge_pmd(), where we converted a write hug= e migration entry into a read one. 4. In copy_nonpresent_pte(), drop uffd-wp if necessary for swap ptes. 5. In copy_present_page() when COW is enforced when fork(), we also nee= d to pass over the uffd-wp bit if VM_UFFD_WP is armed on the new vma, and= when the pte to be copied has uffd-wp bit set. Remove the comment in copy_present_pte() about this. It won't help a hug= e lot to only comment there, but comment everywhere would be an overkill. Let'= s assume the commit messages would help. Cc: Jerome Glisse Cc: Mike Rapoport Fixes: b569a1760782f3da03ff718d61f74163dea599ff Signed-off-by: Peter Xu --- include/linux/huge_mm.h | 2 +- mm/huge_memory.c | 23 ++++++++++------------- mm/memory.c | 25 +++++++++++++------------ 3 files changed, 24 insertions(+), 26 deletions(-) diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index 9626fda5efcea..60dad7c88d72b 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -10,7 +10,7 @@ vm_fault_t do_huge_pmd_anonymous_page(struct vm_fault *vmf); int copy_huge_pmd(struct mm_struct *dst_mm, struct mm_struct *src_mm, pmd_t *dst_pmd, pmd_t *src_pmd, unsigned long addr, - struct vm_area_struct *vma); + struct vm_area_struct *dst_vma, struct vm_area_struct *src_vma); void huge_pmd_set_accessed(struct vm_fault *vmf, pmd_t orig_pmd); int copy_huge_pud(struct mm_struct *dst_mm, struct mm_struct *src_mm, pud_t *dst_pud, pud_t *src_pud, unsigned long addr, diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 22bf2d0fff79b..20a4569895254 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1014,7 +1014,7 @@ struct page *follow_devmap_pmd(struct vm_area_struc= t *vma, unsigned long addr, =20 int copy_huge_pmd(struct mm_struct *dst_mm, struct mm_struct *src_mm, pmd_t *dst_pmd, pmd_t *src_pmd, unsigned long addr, - struct vm_area_struct *vma) + struct vm_area_struct *dst_vma, struct vm_area_struct *src_vma) { spinlock_t *dst_ptl, *src_ptl; struct page *src_page; @@ -1023,7 +1023,7 @@ int copy_huge_pmd(struct mm_struct *dst_mm, struct = mm_struct *src_mm, int ret =3D -ENOMEM; =20 /* Skip if can be re-fill on fault */ - if (!vma_is_anonymous(vma)) + if (!vma_is_anonymous(dst_vma)) return 0; =20 pgtable =3D pte_alloc_one(dst_mm); @@ -1037,14 +1037,6 @@ int copy_huge_pmd(struct mm_struct *dst_mm, struct= mm_struct *src_mm, ret =3D -EAGAIN; pmd =3D *src_pmd; =20 - /* - * Make sure the _PAGE_UFFD_WP bit is cleared if the new VMA - * does not have the VM_UFFD_WP, which means that the uffd - * fork event is not enabled. - */ - if (!(vma->vm_flags & VM_UFFD_WP)) - pmd =3D pmd_clear_uffd_wp(pmd); - #ifdef CONFIG_ARCH_ENABLE_THP_MIGRATION if (unlikely(is_swap_pmd(pmd))) { swp_entry_t entry =3D pmd_to_swp_entry(pmd); @@ -1055,11 +1047,15 @@ int copy_huge_pmd(struct mm_struct *dst_mm, struc= t mm_struct *src_mm, pmd =3D swp_entry_to_pmd(entry); if (pmd_swp_soft_dirty(*src_pmd)) pmd =3D pmd_swp_mksoft_dirty(pmd); + if (pmd_swp_uffd_wp(*src_pmd)) + pmd =3D pmd_swp_mkuffd_wp(pmd); set_pmd_at(src_mm, addr, src_pmd, pmd); } add_mm_counter(dst_mm, MM_ANONPAGES, HPAGE_PMD_NR); mm_inc_nr_ptes(dst_mm); pgtable_trans_huge_deposit(dst_mm, dst_pmd, pgtable); + if (!userfaultfd_wp(dst_vma)) + pmd =3D pmd_swp_clear_uffd_wp(pmd); set_pmd_at(dst_mm, addr, dst_pmd, pmd); ret =3D 0; goto out_unlock; @@ -1095,11 +1091,11 @@ int copy_huge_pmd(struct mm_struct *dst_mm, struc= t mm_struct *src_mm, * best effort that the pinned pages won't be replaced by another * random page during the coming copy-on-write. */ - if (unlikely(page_needs_cow_for_dma(vma, src_page))) { + if (unlikely(page_needs_cow_for_dma(src_vma, src_page))) { pte_free(dst_mm, pgtable); spin_unlock(src_ptl); spin_unlock(dst_ptl); - __split_huge_pmd(vma, src_pmd, addr, false, NULL); + __split_huge_pmd(src_vma, src_pmd, addr, false, NULL); return -EAGAIN; } =20 @@ -1109,8 +1105,9 @@ int copy_huge_pmd(struct mm_struct *dst_mm, struct = mm_struct *src_mm, out_zero_page: mm_inc_nr_ptes(dst_mm); pgtable_trans_huge_deposit(dst_mm, dst_pmd, pgtable); - pmdp_set_wrprotect(src_mm, addr, src_pmd); + if (!userfaultfd_wp(dst_vma)) + pmd =3D pmd_clear_uffd_wp(pmd); pmd =3D pmd_mkold(pmd_wrprotect(pmd)); set_pmd_at(dst_mm, addr, dst_pmd, pmd); =20 diff --git a/mm/memory.c b/mm/memory.c index 045daf58608f7..a17a53a7dade6 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -708,10 +708,10 @@ struct page *vm_normal_page_pmd(struct vm_area_stru= ct *vma, unsigned long addr, =20 static unsigned long copy_nonpresent_pte(struct mm_struct *dst_mm, struct mm_struct *src_mm, - pte_t *dst_pte, pte_t *src_pte, struct vm_area_struct *vma, - unsigned long addr, int *rss) + pte_t *dst_pte, pte_t *src_pte, struct vm_area_struct *dst_vma, + struct vm_area_struct *src_vma, unsigned long addr, int *rss) { - unsigned long vm_flags =3D vma->vm_flags; + unsigned long vm_flags =3D dst_vma->vm_flags; pte_t pte =3D *src_pte; struct page *page; swp_entry_t entry =3D pte_to_swp_entry(pte); @@ -780,6 +780,8 @@ copy_nonpresent_pte(struct mm_struct *dst_mm, struct = mm_struct *src_mm, set_pte_at(src_mm, addr, src_pte, pte); } } + if (!userfaultfd_wp(dst_vma)) + pte =3D pte_swp_clear_uffd_wp(pte); set_pte_at(dst_mm, addr, dst_pte, pte); return 0; } @@ -845,6 +847,9 @@ copy_present_page(struct vm_area_struct *dst_vma, str= uct vm_area_struct *src_vma /* All done, just insert the new page copy in the child */ pte =3D mk_pte(new_page, dst_vma->vm_page_prot); pte =3D maybe_mkwrite(pte_mkdirty(pte), dst_vma); + if (userfaultfd_pte_wp(dst_vma, *src_pte)) + /* Uffd-wp needs to be delivered to dest pte as well */ + pte =3D pte_wrprotect(pte_mkuffd_wp(pte)); set_pte_at(dst_vma->vm_mm, addr, dst_pte, pte); return 0; } @@ -894,12 +899,7 @@ copy_present_pte(struct vm_area_struct *dst_vma, str= uct vm_area_struct *src_vma, pte =3D pte_mkclean(pte); pte =3D pte_mkold(pte); =20 - /* - * Make sure the _PAGE_UFFD_WP bit is cleared if the new VMA - * does not have the VM_UFFD_WP, which means that the uffd - * fork event is not enabled. - */ - if (!(vm_flags & VM_UFFD_WP)) + if (!userfaultfd_wp(dst_vma)) pte =3D pte_clear_uffd_wp(pte); =20 set_pte_at(dst_vma->vm_mm, addr, dst_pte, pte); @@ -974,7 +974,8 @@ copy_pte_range(struct vm_area_struct *dst_vma, struct= vm_area_struct *src_vma, if (unlikely(!pte_present(*src_pte))) { entry.val =3D copy_nonpresent_pte(dst_mm, src_mm, dst_pte, src_pte, - src_vma, addr, rss); + dst_vma, src_vma, + addr, rss); if (entry.val) break; progress +=3D 8; @@ -1051,8 +1052,8 @@ copy_pmd_range(struct vm_area_struct *dst_vma, stru= ct vm_area_struct *src_vma, || pmd_devmap(*src_pmd)) { int err; VM_BUG_ON_VMA(next-addr !=3D HPAGE_PMD_SIZE, src_vma); - err =3D copy_huge_pmd(dst_mm, src_mm, - dst_pmd, src_pmd, addr, src_vma); + err =3D copy_huge_pmd(dst_mm, src_mm, dst_pmd, src_pmd, + addr, dst_vma, src_vma); if (err =3D=3D -ENOMEM) return -ENOMEM; if (!err) --=20 2.26.2