linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Peter Xu <peterx@redhat.com>
To: linux-mm@kvack.org, linux-kernel@vger.kernel.org
Cc: Nadav Amit <nadav.amit@gmail.com>,
	Miaohe Lin <linmiaohe@huawei.com>,
	Mike Rapoport <rppt@linux.vnet.ibm.com>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Hugh Dickins <hughd@google.com>,
	peterx@redhat.com, Jerome Glisse <jglisse@redhat.com>,
	Mike Kravetz <mike.kravetz@oracle.com>,
	Jason Gunthorpe <jgg@ziepe.ca>,
	Matthew Wilcox <willy@infradead.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Axel Rasmussen <axelrasmussen@google.com>,
	"Kirill A . Shutemov" <kirill@shutemov.name>
Subject: [PATCH v2 22/24] hugetlb/userfaultfd: Only drop uffd-wp special pte if required
Date: Tue, 27 Apr 2021 12:13:15 -0400	[thread overview]
Message-ID: <20210427161317.50682-23-peterx@redhat.com> (raw)
In-Reply-To: <20210427161317.50682-1-peterx@redhat.com>

As with shmem uffd-wp special ptes, only drop the uffd-wp special swap pte if
unmapping an entire vma or synchronized such that faults can not race with the
unmap operation.  This requires passing zap_flags all the way to the lowest
level hugetlb unmap routine: __unmap_hugepage_range.

In general, unmap calls originated in hugetlbfs code will pass the
ZAP_FLAG_DROP_FILE_UFFD_WP flag as synchronization is in place to prevent
faults.  The exception is hole punch which will first unmap without any
synchronization.  Later when hole punch actually removes the page from the
file, it will check to see if there was a subsequent fault and if so take the
hugetlb fault mutex while unmapping again.  This second unmap will pass in
ZAP_FLAG_DROP_FILE_UFFD_WP.

The core justification of "whether to apply ZAP_FLAG_DROP_FILE_UFFD_WP flag
when unmap a hugetlb range" is (IMHO): we should never reach a state when a
page fault could errornously fault in a page-cache page that was wr-protected
to be writable, even in an extremely short period.  That could happen if
e.g. we pass ZAP_FLAG_DROP_FILE_UFFD_WP in hugetlbfs_punch_hole() when calling
hugetlb_vmdelete_list(), because if a page fault triggers after that call and
before the remove_inode_hugepages() right after it, the page cache can be
mapped writable again in the small window, which can cause data corruption.

Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 fs/hugetlbfs/inode.c    | 15 +++++++++------
 include/linux/hugetlb.h |  8 +++++---
 mm/hugetlb.c            | 27 +++++++++++++++++++++------
 mm/memory.c             |  5 ++++-
 4 files changed, 39 insertions(+), 16 deletions(-)

diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
index a2a42335e8fd2..9b383c39756a5 100644
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -399,7 +399,8 @@ static void remove_huge_page(struct page *page)
 }
 
 static void
-hugetlb_vmdelete_list(struct rb_root_cached *root, pgoff_t start, pgoff_t end)
+hugetlb_vmdelete_list(struct rb_root_cached *root, pgoff_t start, pgoff_t end,
+		      unsigned long zap_flags)
 {
 	struct vm_area_struct *vma;
 
@@ -432,7 +433,7 @@ hugetlb_vmdelete_list(struct rb_root_cached *root, pgoff_t start, pgoff_t end)
 		}
 
 		unmap_hugepage_range(vma, vma->vm_start + v_offset, v_end,
-									NULL);
+				     NULL, zap_flags);
 	}
 }
 
@@ -510,7 +511,8 @@ static void remove_inode_hugepages(struct inode *inode, loff_t lstart,
 				mutex_lock(&hugetlb_fault_mutex_table[hash]);
 				hugetlb_vmdelete_list(&mapping->i_mmap,
 					index * pages_per_huge_page(h),
-					(index + 1) * pages_per_huge_page(h));
+					(index + 1) * pages_per_huge_page(h),
+					ZAP_FLAG_DROP_FILE_UFFD_WP);
 				i_mmap_unlock_write(mapping);
 			}
 
@@ -576,7 +578,8 @@ static void hugetlb_vmtruncate(struct inode *inode, loff_t offset)
 	i_mmap_lock_write(mapping);
 	i_size_write(inode, offset);
 	if (!RB_EMPTY_ROOT(&mapping->i_mmap.rb_root))
-		hugetlb_vmdelete_list(&mapping->i_mmap, pgoff, 0);
+		hugetlb_vmdelete_list(&mapping->i_mmap, pgoff, 0,
+				      ZAP_FLAG_DROP_FILE_UFFD_WP);
 	i_mmap_unlock_write(mapping);
 	remove_inode_hugepages(inode, offset, LLONG_MAX);
 }
@@ -609,8 +612,8 @@ static long hugetlbfs_punch_hole(struct inode *inode, loff_t offset, loff_t len)
 		i_mmap_lock_write(mapping);
 		if (!RB_EMPTY_ROOT(&mapping->i_mmap.rb_root))
 			hugetlb_vmdelete_list(&mapping->i_mmap,
-						hole_start >> PAGE_SHIFT,
-						hole_end  >> PAGE_SHIFT);
+					      hole_start >> PAGE_SHIFT,
+					      hole_end >> PAGE_SHIFT, 0);
 		i_mmap_unlock_write(mapping);
 		remove_inode_hugepages(inode, hole_start, hole_end);
 		inode_unlock(inode);
diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index 652660fd6ec8a..5fa84bbefa628 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -121,11 +121,12 @@ long follow_hugetlb_page(struct mm_struct *, struct vm_area_struct *,
 			 unsigned long *, unsigned long *, long, unsigned int,
 			 int *);
 void unmap_hugepage_range(struct vm_area_struct *,
-			  unsigned long, unsigned long, struct page *);
+			  unsigned long, unsigned long, struct page *,
+			  unsigned long);
 void __unmap_hugepage_range_final(struct mmu_gather *tlb,
 			  struct vm_area_struct *vma,
 			  unsigned long start, unsigned long end,
-			  struct page *ref_page);
+			  struct page *ref_page, unsigned long zap_flags);
 void hugetlb_report_meminfo(struct seq_file *);
 int hugetlb_report_node_meminfo(char *buf, int len, int nid);
 void hugetlb_show_meminfo(void);
@@ -358,7 +359,8 @@ static inline unsigned long hugetlb_change_protection(
 
 static inline void __unmap_hugepage_range_final(struct mmu_gather *tlb,
 			struct vm_area_struct *vma, unsigned long start,
-			unsigned long end, struct page *ref_page)
+			unsigned long end, struct page *ref_page,
+			unsigned long zap_flags)
 {
 	BUG();
 }
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index fa9af9c893512..f73a236b5a835 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -4096,7 +4096,7 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src,
 
 void __unmap_hugepage_range(struct mmu_gather *tlb, struct vm_area_struct *vma,
 			    unsigned long start, unsigned long end,
-			    struct page *ref_page)
+			    struct page *ref_page, unsigned long zap_flags)
 {
 	struct mm_struct *mm = vma->vm_mm;
 	unsigned long address;
@@ -4148,6 +4148,19 @@ void __unmap_hugepage_range(struct mmu_gather *tlb, struct vm_area_struct *vma,
 			continue;
 		}
 
+		if (unlikely(is_swap_special_pte(pte))) {
+			WARN_ON_ONCE(!pte_swp_uffd_wp_special(pte));
+			/*
+			 * Only drop the special swap uffd-wp pte if
+			 * e.g. unmapping a vma or punching a hole (with proper
+			 * lock held so that concurrent page fault won't happen).
+			 */
+			if (zap_flags & ZAP_FLAG_DROP_FILE_UFFD_WP)
+				huge_pte_clear(mm, address, ptep, sz);
+			spin_unlock(ptl);
+			continue;
+		}
+
 		/*
 		 * Migrating hugepage or HWPoisoned hugepage is already
 		 * unmapped and its refcount is dropped, so just clear pte here.
@@ -4199,9 +4212,10 @@ void __unmap_hugepage_range(struct mmu_gather *tlb, struct vm_area_struct *vma,
 
 void __unmap_hugepage_range_final(struct mmu_gather *tlb,
 			  struct vm_area_struct *vma, unsigned long start,
-			  unsigned long end, struct page *ref_page)
+			  unsigned long end, struct page *ref_page,
+			  unsigned long zap_flags)
 {
-	__unmap_hugepage_range(tlb, vma, start, end, ref_page);
+	__unmap_hugepage_range(tlb, vma, start, end, ref_page, zap_flags);
 
 	/*
 	 * Clear this flag so that x86's huge_pmd_share page_table_shareable
@@ -4217,12 +4231,13 @@ void __unmap_hugepage_range_final(struct mmu_gather *tlb,
 }
 
 void unmap_hugepage_range(struct vm_area_struct *vma, unsigned long start,
-			  unsigned long end, struct page *ref_page)
+			  unsigned long end, struct page *ref_page,
+			  unsigned long zap_flags)
 {
 	struct mmu_gather tlb;
 
 	tlb_gather_mmu(&tlb, vma->vm_mm);
-	__unmap_hugepage_range(&tlb, vma, start, end, ref_page);
+	__unmap_hugepage_range(&tlb, vma, start, end, ref_page, zap_flags);
 	tlb_finish_mmu(&tlb);
 }
 
@@ -4277,7 +4292,7 @@ static void unmap_ref_private(struct mm_struct *mm, struct vm_area_struct *vma,
 		 */
 		if (!is_vma_resv_set(iter_vma, HPAGE_RESV_OWNER))
 			unmap_hugepage_range(iter_vma, address,
-					     address + huge_page_size(h), page);
+					     address + huge_page_size(h), page, 0);
 	}
 	i_mmap_unlock_write(mapping);
 }
diff --git a/mm/memory.c b/mm/memory.c
index f1cdc613b5887..99741c9254c5b 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1515,8 +1515,11 @@ static void unmap_single_vma(struct mmu_gather *tlb,
 			 * safe to do nothing in this case.
 			 */
 			if (vma->vm_file) {
+				unsigned long zap_flags = details ?
+				    details->zap_flags : 0;
 				i_mmap_lock_write(vma->vm_file->f_mapping);
-				__unmap_hugepage_range_final(tlb, vma, start, end, NULL);
+				__unmap_hugepage_range_final(tlb, vma, start, end,
+							     NULL, zap_flags);
 				i_mmap_unlock_write(vma->vm_file->f_mapping);
 			}
 		} else
-- 
2.26.2



  parent reply	other threads:[~2021-04-27 16:14 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-27 16:12 [PATCH v2 00/24] userfaultfd-wp: Support shmem and hugetlbfs Peter Xu
2021-04-27 16:12 ` [PATCH v2 01/24] shmem/userfaultfd: Take care of UFFDIO_COPY_MODE_WP Peter Xu
2021-04-27 16:12 ` [PATCH v2 02/24] mm: Clear vmf->pte after pte_unmap_same() returns Peter Xu
2021-04-27 16:12 ` [PATCH v2 03/24] mm/userfaultfd: Introduce special pte for unmapped file-backed mem Peter Xu
2021-04-27 16:12 ` [PATCH v2 04/24] mm/swap: Introduce the idea of special swap ptes Peter Xu
2021-04-27 16:12 ` [PATCH v2 05/24] shmem/userfaultfd: Handle uffd-wp special pte in page fault handler Peter Xu
2021-04-27 18:54   ` Peter Xu
2021-04-27 16:12 ` [PATCH v2 06/24] mm: Drop first_index/last_index in zap_details Peter Xu
2021-04-27 16:13 ` [PATCH v2 07/24] mm: Introduce zap_details.zap_flags Peter Xu
2021-04-27 16:13 ` [PATCH v2 08/24] mm: Introduce ZAP_FLAG_SKIP_SWAP Peter Xu
2021-04-27 16:13 ` [PATCH v2 09/24] mm: Pass zap_flags into unmap_mapping_pages() Peter Xu
2021-04-27 16:13 ` [PATCH v2 10/24] shmem/userfaultfd: Persist uffd-wp bit across zapping for file-backed Peter Xu
2021-04-27 16:13 ` [PATCH v2 11/24] shmem/userfaultfd: Allow wr-protect none pte for file-backed mem Peter Xu
2021-04-27 16:13 ` [PATCH v2 12/24] shmem/userfaultfd: Allows file-back mem to be uffd wr-protected on thps Peter Xu
2021-04-27 16:13 ` [PATCH v2 13/24] shmem/userfaultfd: Handle the left-overed special swap ptes Peter Xu
2021-04-27 16:13 ` [PATCH v2 14/24] shmem/userfaultfd: Pass over uffd-wp special swap pte when fork() Peter Xu
2021-04-27 16:13 ` [PATCH v2 15/24] mm/hugetlb: Drop __unmap_hugepage_range definition from hugetlb.h Peter Xu
2021-04-27 16:13 ` [PATCH v2 16/24] hugetlb/userfaultfd: Hook page faults for uffd write protection Peter Xu
2021-04-28  0:07   ` Peter Xu
2021-04-27 16:13 ` [PATCH v2 17/24] hugetlb/userfaultfd: Take care of UFFDIO_COPY_MODE_WP Peter Xu
2021-04-27 16:13 ` [PATCH v2 18/24] hugetlb/userfaultfd: Handle UFFDIO_WRITEPROTECT Peter Xu
2021-04-27 16:13 ` [PATCH v2 19/24] mm/hugetlb: Introduce huge version of special swap pte helpers Peter Xu
2021-04-27 16:13 ` [PATCH v2 20/24] hugetlb/userfaultfd: Handle uffd-wp special pte in hugetlb pf handler Peter Xu
2021-04-27 16:13 ` [PATCH v2 21/24] hugetlb/userfaultfd: Allow wr-protect none ptes Peter Xu
2021-04-27 16:13 ` Peter Xu [this message]
2021-04-27 16:13 ` [PATCH v2 23/24] mm/userfaultfd: Enable write protection for shmem & hugetlbfs Peter Xu
2021-04-27 16:13 ` [PATCH v2 24/24] userfaultfd/selftests: Enable uffd-wp for shmem/hugetlbfs Peter Xu
2021-05-12 19:00 ` [PATCH v2 00/24] userfaultfd-wp: Support shmem and hugetlbfs Peter Xu
2021-05-14  7:07   ` Hugh Dickins
2021-05-14 13:18     ` Peter Xu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210427161317.50682-23-peterx@redhat.com \
    --to=peterx@redhat.com \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=axelrasmussen@google.com \
    --cc=hughd@google.com \
    --cc=jgg@ziepe.ca \
    --cc=jglisse@redhat.com \
    --cc=kirill@shutemov.name \
    --cc=linmiaohe@huawei.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mike.kravetz@oracle.com \
    --cc=nadav.amit@gmail.com \
    --cc=rppt@linux.vnet.ibm.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox