From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-9.9 required=3.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH, MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8ADD3C11D10 for ; Thu, 20 Feb 2020 16:31:46 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 3749B20722 for ; Thu, 20 Feb 2020 16:31:46 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="Cccty3mE" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 3749B20722 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id D8C636B006C; Thu, 20 Feb 2020 11:31:39 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id D3CD26B006E; Thu, 20 Feb 2020 11:31:39 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BB7626B0070; Thu, 20 Feb 2020 11:31:39 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0215.hostedemail.com [216.40.44.215]) by kanga.kvack.org (Postfix) with ESMTP id A112D6B006C for ; Thu, 20 Feb 2020 11:31:39 -0500 (EST) Received: from smtpin05.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 3CF01824999B for ; Thu, 20 Feb 2020 16:31:39 +0000 (UTC) X-FDA: 76511046318.05.dirt35_2b79cf2a8e85e X-HE-Tag: dirt35_2b79cf2a8e85e X-Filterd-Recvd-Size: 12883 Received: from us-smtp-delivery-1.mimecast.com (us-smtp-2.mimecast.com [205.139.110.61]) by imf29.hostedemail.com (Postfix) with ESMTP for ; Thu, 20 Feb 2020 16:31:38 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1582216297; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=SGrYGGrwd0WeLu0wRMf858XeOMo8rZhlWfdXnFjnISk=; b=Cccty3mEILuY38xicdeIejW+1DoWlgLSndQPk2B4Gd/TZefw4pZMtAM2JPIrZsIkc9nnUz XSxD+3KjOtTL+/+aNF1S/jFlkWj/mBwM87Q9yyHeBRx4ub0gkuwVlU0bvYOm0zcv3hV5wM kvrwK8IP5DTiY7r0uxbJvTQAPrray/I= Received: from mail-qt1-f197.google.com (mail-qt1-f197.google.com [209.85.160.197]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-96-yVR5h-KNNZKumullOigvRQ-1; Thu, 20 Feb 2020 11:31:35 -0500 X-MC-Unique: yVR5h-KNNZKumullOigvRQ-1 Received: by mail-qt1-f197.google.com with SMTP id d9so2962760qtq.13 for ; Thu, 20 Feb 2020 08:31:35 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=r0sDJodwNgyoOwbHrwVGLbL90Ddr5lTCsGS0o/0PgMM=; b=fSTaUsHQCii7VvY0aE+N8/MoMzto34spbDKz+sqwOudqcEMCcUYhoXg+NwCaJBhcgW xE4PmXk0eUlCi34zyW2K41YY0VpQYvPhioXaFHj9RhQbeJqXBFgZ2SIePkK44ll7oKer eSFKV1fw+PUGcuNCg8Rr5hCV036QbNN8og3CRt+61Ttoh/4tVgn8lXDvUUKKIckPgASJ tOrdMujqe6C4HkW8y0i8GHve4WUO6mU9/xfilbixogko/lzogFWAvf6Rjd48Xf6hBvb6 guPNUJO0WfBPCCK/cJTS/wYoHny7MWOS9Aml7dfxhoDmRMt9lz4FumXJKvc+hSp1Jbbs AK7Q== X-Gm-Message-State: APjAAAXZGb/Rf4GnvhgOX8L93PfVe8oFrcn1htqm5//eejqhCZSj7TyH /Wk2FV44yaIHkyvFO6oAEPP+kys+vCWVaUbIn0KUK0TdjNTTvkvll6yLrY8n68y4RHoFcq0se64 QreqPO23dWeM= X-Received: by 2002:ad4:518b:: with SMTP id b11mr26734701qvp.195.1582216294719; Thu, 20 Feb 2020 08:31:34 -0800 (PST) X-Google-Smtp-Source: APXvYqzMBpmXyr/u1LNt9PUL6pwnIZzwSdaDP3RIPkOsS6PRT517wakgQUFK3sPZeVT/zL3xK1gWJA== X-Received: by 2002:ad4:518b:: with SMTP id b11mr26734653qvp.195.1582216294337; Thu, 20 Feb 2020 08:31:34 -0800 (PST) Received: from xz-x1.redhat.com ([104.156.64.75]) by smtp.gmail.com with ESMTPSA id l19sm42366qkl.3.2020.02.20.08.31.31 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 20 Feb 2020 08:31:33 -0800 (PST) From: Peter Xu To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Brian Geffon , Pavel Emelyanov , Mike Kravetz , David Hildenbrand , peterx@redhat.com, Martin Cracauer , Andrea Arcangeli , Mel Gorman , Bobby Powers , Mike Rapoport , "Kirill A . Shutemov" , Maya Gokhale , Johannes Weiner , Marty McFadden , Denis Plotnikov , Hugh Dickins , "Dr . David Alan Gilbert" , Jerome Glisse Subject: [PATCH v6 10/19] userfaultfd: wp: support swap and page migration Date: Thu, 20 Feb 2020 11:31:03 -0500 Message-Id: <20200220163112.11409-11-peterx@redhat.com> X-Mailer: git-send-email 2.24.1 In-Reply-To: <20200220163112.11409-1-peterx@redhat.com> References: <20200220163112.11409-1-peterx@redhat.com> MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: For either swap and page migration, we all use the bit 2 of the entry to identify whether this entry is uffd write-protected. It plays a similar role as the existing soft dirty bit in swap entries but only for keeping the uffd-wp tracking for a specific PTE/PMD. Something special here is that when we want to recover the uffd-wp bit from a swap/migration entry to the PTE bit we'll also need to take care of the _PAGE_RW bit and make sure it's cleared, otherwise even with the _PAGE_UFFD_WP bit we can't trap it at all. In change_pte_range() we do nothing for uffd if the PTE is a swap entry. That can lead to data mismatch if the page that we are going to write protect is swapped out when sending the UFFDIO_WRITEPROTECT. This patch also applies/removes the uffd-wp bit even for the swap entries. Signed-off-by: Peter Xu --- include/linux/swapops.h | 2 ++ mm/huge_memory.c | 3 +++ mm/memory.c | 8 ++++++++ mm/migrate.c | 6 ++++++ mm/mprotect.c | 28 +++++++++++++++++----------- mm/rmap.c | 6 ++++++ 6 files changed, 42 insertions(+), 11 deletions(-) diff --git a/include/linux/swapops.h b/include/linux/swapops.h index 877fd239b6ff..9a6f06de183b 100644 --- a/include/linux/swapops.h +++ b/include/linux/swapops.h @@ -68,6 +68,8 @@ static inline swp_entry_t pte_to_swp_entry(pte_t pte) =20 =09if (pte_swp_soft_dirty(pte)) =09=09pte =3D pte_swp_clear_soft_dirty(pte); +=09if (pte_swp_uffd_wp(pte)) +=09=09pte =3D pte_swp_clear_uffd_wp(pte); =09arch_entry =3D __pte_to_swp_entry(pte); =09return swp_entry(__swp_type(arch_entry), __swp_offset(arch_entry)); } diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 134bef68a1de..ef18ad16b7ed 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -2252,6 +2252,7 @@ static void __split_huge_pmd_locked(struct vm_area_st= ruct *vma, pmd_t *pmd, =09=09write =3D is_write_migration_entry(entry); =09=09young =3D false; =09=09soft_dirty =3D pmd_swp_soft_dirty(old_pmd); +=09=09uffd_wp =3D pmd_swp_uffd_wp(old_pmd); =09} else { =09=09page =3D pmd_page(old_pmd); =09=09if (pmd_dirty(old_pmd)) @@ -2284,6 +2285,8 @@ static void __split_huge_pmd_locked(struct vm_area_st= ruct *vma, pmd_t *pmd, =09=09=09entry =3D swp_entry_to_pte(swp_entry); =09=09=09if (soft_dirty) =09=09=09=09entry =3D pte_swp_mksoft_dirty(entry); +=09=09=09if (uffd_wp) +=09=09=09=09entry =3D pte_swp_mkuffd_wp(entry); =09=09} else { =09=09=09entry =3D mk_pte(page + i, READ_ONCE(vma->vm_page_prot)); =09=09=09entry =3D maybe_mkwrite(entry, vma); diff --git a/mm/memory.c b/mm/memory.c index 557837ec29c3..103c1cf9b794 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -733,6 +733,8 @@ copy_one_pte(struct mm_struct *dst_mm, struct mm_struct= *src_mm, =09=09=09=09pte =3D swp_entry_to_pte(entry); =09=09=09=09if (pte_swp_soft_dirty(*src_pte)) =09=09=09=09=09pte =3D pte_swp_mksoft_dirty(pte); +=09=09=09=09if (pte_swp_uffd_wp(*src_pte)) +=09=09=09=09=09pte =3D pte_swp_mkuffd_wp(pte); =09=09=09=09set_pte_at(src_mm, addr, src_pte, pte); =09=09=09} =09=09} else if (is_device_private_entry(entry)) { @@ -762,6 +764,8 @@ copy_one_pte(struct mm_struct *dst_mm, struct mm_struct= *src_mm, =09=09=09 is_cow_mapping(vm_flags)) { =09=09=09=09make_device_private_entry_read(&entry); =09=09=09=09pte =3D swp_entry_to_pte(entry); +=09=09=09=09if (pte_swp_uffd_wp(*src_pte)) +=09=09=09=09=09pte =3D pte_swp_mkuffd_wp(pte); =09=09=09=09set_pte_at(src_mm, addr, src_pte, pte); =09=09=09} =09=09} @@ -3079,6 +3083,10 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) =09flush_icache_page(vma, page); =09if (pte_swp_soft_dirty(vmf->orig_pte)) =09=09pte =3D pte_mksoft_dirty(pte); +=09if (pte_swp_uffd_wp(vmf->orig_pte)) { +=09=09pte =3D pte_mkuffd_wp(pte); +=09=09pte =3D pte_wrprotect(pte); +=09} =09set_pte_at(vma->vm_mm, vmf->address, vmf->pte, pte); =09arch_do_swap_page(vma->vm_mm, vma, vmf->address, pte, vmf->orig_pte); =09vmf->orig_pte =3D pte; diff --git a/mm/migrate.c b/mm/migrate.c index b1092876e537..73cbdbf69fc5 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -243,11 +243,15 @@ static bool remove_migration_pte(struct page *page, s= truct vm_area_struct *vma, =09=09entry =3D pte_to_swp_entry(*pvmw.pte); =09=09if (is_write_migration_entry(entry)) =09=09=09pte =3D maybe_mkwrite(pte, vma); +=09=09else if (pte_swp_uffd_wp(*pvmw.pte)) +=09=09=09pte =3D pte_mkuffd_wp(pte); =20 =09=09if (unlikely(is_zone_device_page(new))) { =09=09=09if (is_device_private_page(new)) { =09=09=09=09entry =3D make_device_private_entry(new, pte_write(pte)); =09=09=09=09pte =3D swp_entry_to_pte(entry); +=09=09=09=09if (pte_swp_uffd_wp(*pvmw.pte)) +=09=09=09=09=09pte =3D pte_mkuffd_wp(pte); =09=09=09} =09=09} =20 @@ -2318,6 +2322,8 @@ static int migrate_vma_collect_pmd(pmd_t *pmdp, =09=09=09swp_pte =3D swp_entry_to_pte(entry); =09=09=09if (pte_soft_dirty(pte)) =09=09=09=09swp_pte =3D pte_swp_mksoft_dirty(swp_pte); +=09=09=09if (pte_uffd_wp(pte)) +=09=09=09=09swp_pte =3D pte_swp_mkuffd_wp(swp_pte); =09=09=09set_pte_at(mm, addr, ptep, swp_pte); =20 =09=09=09/* diff --git a/mm/mprotect.c b/mm/mprotect.c index 22a1c78e3f51..104ac88163d4 100644 --- a/mm/mprotect.c +++ b/mm/mprotect.c @@ -139,11 +139,11 @@ static unsigned long change_pte_range(struct vm_area_= struct *vma, pmd_t *pmd, =09=09=09} =09=09=09ptep_modify_prot_commit(vma, addr, pte, oldpte, ptent); =09=09=09pages++; -=09=09} else if (IS_ENABLED(CONFIG_MIGRATION)) { +=09=09} else if (is_swap_pte(oldpte)) { =09=09=09swp_entry_t entry =3D pte_to_swp_entry(oldpte); +=09=09=09pte_t newpte; =20 =09=09=09if (is_write_migration_entry(entry)) { -=09=09=09=09pte_t newpte; =09=09=09=09/* =09=09=09=09 * A protection check is difficult so =09=09=09=09 * just be safe and disable write @@ -152,22 +152,28 @@ static unsigned long change_pte_range(struct vm_area_= struct *vma, pmd_t *pmd, =09=09=09=09newpte =3D swp_entry_to_pte(entry); =09=09=09=09if (pte_swp_soft_dirty(oldpte)) =09=09=09=09=09newpte =3D pte_swp_mksoft_dirty(newpte); -=09=09=09=09set_pte_at(vma->vm_mm, addr, pte, newpte); - -=09=09=09=09pages++; -=09=09=09} - -=09=09=09if (is_write_device_private_entry(entry)) { -=09=09=09=09pte_t newpte; - +=09=09=09=09if (pte_swp_uffd_wp(oldpte)) +=09=09=09=09=09newpte =3D pte_swp_mkuffd_wp(newpte); +=09=09=09} else if (is_write_device_private_entry(entry)) { =09=09=09=09/* =09=09=09=09 * We do not preserve soft-dirtiness. See =09=09=09=09 * copy_one_pte() for explanation. =09=09=09=09 */ =09=09=09=09make_device_private_entry_read(&entry); =09=09=09=09newpte =3D swp_entry_to_pte(entry); -=09=09=09=09set_pte_at(vma->vm_mm, addr, pte, newpte); +=09=09=09=09if (pte_swp_uffd_wp(oldpte)) +=09=09=09=09=09newpte =3D pte_swp_mkuffd_wp(newpte); +=09=09=09} else { +=09=09=09=09newpte =3D oldpte; +=09=09=09} =20 +=09=09=09if (uffd_wp) +=09=09=09=09newpte =3D pte_swp_mkuffd_wp(newpte); +=09=09=09else if (uffd_wp_resolve) +=09=09=09=09newpte =3D pte_swp_clear_uffd_wp(newpte); + +=09=09=09if (!pte_same(oldpte, newpte)) { +=09=09=09=09set_pte_at(vma->vm_mm, addr, pte, newpte); =09=09=09=09pages++; =09=09=09} =09=09} diff --git a/mm/rmap.c b/mm/rmap.c index b3e381919835..ce935d0ddf75 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -1497,6 +1497,8 @@ static bool try_to_unmap_one(struct page *page, struc= t vm_area_struct *vma, =09=09=09swp_pte =3D swp_entry_to_pte(entry); =09=09=09if (pte_soft_dirty(pteval)) =09=09=09=09swp_pte =3D pte_swp_mksoft_dirty(swp_pte); +=09=09=09if (pte_uffd_wp(pteval)) +=09=09=09=09swp_pte =3D pte_swp_mkuffd_wp(swp_pte); =09=09=09set_pte_at(mm, pvmw.address, pvmw.pte, swp_pte); =09=09=09/* =09=09=09 * No need to invalidate here it will synchronize on @@ -1596,6 +1598,8 @@ static bool try_to_unmap_one(struct page *page, struc= t vm_area_struct *vma, =09=09=09swp_pte =3D swp_entry_to_pte(entry); =09=09=09if (pte_soft_dirty(pteval)) =09=09=09=09swp_pte =3D pte_swp_mksoft_dirty(swp_pte); +=09=09=09if (pte_uffd_wp(pteval)) +=09=09=09=09swp_pte =3D pte_swp_mkuffd_wp(swp_pte); =09=09=09set_pte_at(mm, address, pvmw.pte, swp_pte); =09=09=09/* =09=09=09 * No need to invalidate here it will synchronize on @@ -1662,6 +1666,8 @@ static bool try_to_unmap_one(struct page *page, struc= t vm_area_struct *vma, =09=09=09swp_pte =3D swp_entry_to_pte(entry); =09=09=09if (pte_soft_dirty(pteval)) =09=09=09=09swp_pte =3D pte_swp_mksoft_dirty(swp_pte); +=09=09=09if (pte_uffd_wp(pteval)) +=09=09=09=09swp_pte =3D pte_swp_mkuffd_wp(swp_pte); =09=09=09set_pte_at(mm, address, pvmw.pte, swp_pte); =09=09=09/* Invalidate as we cleared the pte */ =09=09=09mmu_notifier_invalidate_range(mm, address, --=20 2.24.1