linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Yin Fengwei <fengwei.yin@intel.com>
To: linux-mm@kvack.org, akpm@linux-foundation.org,
	willy@infradead.org, sidhartha.kumar@oracle.com,
	mike.kravetz@oracle.com, jane.chu@oracle.com,
	naoya.horiguchi@nec.com
Cc: fengwei.yin@intel.com
Subject: [PATCH v2 1/5] rmap: move hugetlb try_to_unmap to dedicated function
Date: Tue, 28 Feb 2023 20:23:04 +0800	[thread overview]
Message-ID: <20230228122308.2972219-2-fengwei.yin@intel.com> (raw)
In-Reply-To: <20230228122308.2972219-1-fengwei.yin@intel.com>

It's to prepare the batched rmap update for large folio.
No need to looped handle hugetlb. Just handle hugetlb and
bail out early.

Signed-off-by: Yin Fengwei <fengwei.yin@intel.com>
---
 mm/rmap.c | 200 +++++++++++++++++++++++++++++++++---------------------
 1 file changed, 121 insertions(+), 79 deletions(-)

diff --git a/mm/rmap.c b/mm/rmap.c
index 8632e02661ac..0f09518d6f30 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1443,6 +1443,103 @@ void page_remove_rmap(struct page *page, struct vm_area_struct *vma,
 	munlock_vma_folio(folio, vma, compound);
 }
 
+static bool try_to_unmap_one_hugetlb(struct folio *folio,
+		struct vm_area_struct *vma, struct mmu_notifier_range range,
+		struct page_vma_mapped_walk pvmw, unsigned long address,
+		enum ttu_flags flags)
+{
+	struct mm_struct *mm = vma->vm_mm;
+	pte_t pteval;
+	bool ret = true, anon = folio_test_anon(folio);
+
+	/*
+	 * The try_to_unmap() is only passed a hugetlb page
+	 * in the case where the hugetlb page is poisoned.
+	 */
+	VM_BUG_ON_FOLIO(!folio_test_hwpoison(folio), folio);
+	/*
+	 * huge_pmd_unshare may unmap an entire PMD page.
+	 * There is no way of knowing exactly which PMDs may
+	 * be cached for this mm, so we must flush them all.
+	 * start/end were already adjusted above to cover this
+	 * range.
+	 */
+	flush_cache_range(vma, range.start, range.end);
+
+	/*
+	 * To call huge_pmd_unshare, i_mmap_rwsem must be
+	 * held in write mode.  Caller needs to explicitly
+	 * do this outside rmap routines.
+	 *
+	 * We also must hold hugetlb vma_lock in write mode.
+	 * Lock order dictates acquiring vma_lock BEFORE
+	 * i_mmap_rwsem.  We can only try lock here and fail
+	 * if unsuccessful.
+	 */
+	if (!anon) {
+		VM_BUG_ON(!(flags & TTU_RMAP_LOCKED));
+		if (!hugetlb_vma_trylock_write(vma)) {
+			ret = false;
+			goto out;
+		}
+		if (huge_pmd_unshare(mm, vma, address, pvmw.pte)) {
+			hugetlb_vma_unlock_write(vma);
+			flush_tlb_range(vma,
+					range.start, range.end);
+			mmu_notifier_invalidate_range(mm,
+					range.start, range.end);
+			/*
+			 * The ref count of the PMD page was
+			 * dropped which is part of the way map
+			 * counting is done for shared PMDs.
+			 * Return 'true' here.  When there is
+			 * no other sharing, huge_pmd_unshare
+			 * returns false and we will unmap the
+			 * actual page and drop map count
+			 * to zero.
+			 */
+			goto out;
+		}
+		hugetlb_vma_unlock_write(vma);
+	}
+	pteval = huge_ptep_clear_flush(vma, address, pvmw.pte);
+
+	/*
+	 * Now the pte is cleared. If this pte was uffd-wp armed,
+	 * we may want to replace a none pte with a marker pte if
+	 * it's file-backed, so we don't lose the tracking info.
+	 */
+	pte_install_uffd_wp_if_needed(vma, address, pvmw.pte, pteval);
+
+	/* Set the dirty flag on the folio now the pte is gone. */
+	if (pte_dirty(pteval))
+		folio_mark_dirty(folio);
+
+	/* Update high watermark before we lower rss */
+	update_hiwater_rss(mm);
+
+	/* Poisoned hugetlb folio with TTU_HWPOISON always cleared in flags */
+	pteval = swp_entry_to_pte(make_hwpoison_entry(&folio->page));
+	set_huge_pte_at(mm, address, pvmw.pte, pteval);
+	hugetlb_count_sub(folio_nr_pages(folio), mm);
+
+	/*
+	 * No need to call mmu_notifier_invalidate_range() it has be
+	 * done above for all cases requiring it to happen under page
+	 * table lock before mmu_notifier_invalidate_range_end()
+	 *
+	 * See Documentation/mm/mmu_notifier.rst
+	 */
+	page_remove_rmap(&folio->page, vma, folio_test_hugetlb(folio));
+	/* No VM_LOCKED set in vma->vm_flags for hugetlb. So not
+	 * necessary to call mlock_drain_local().
+	 */
+	folio_put(folio);
+
+out:
+	return ret;
+}
+
 /*
  * @arg: enum ttu_flags will be passed to this argument
  */
@@ -1506,86 +1603,37 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
 			break;
 		}
 
+		address = pvmw.address;
+		if (folio_test_hugetlb(folio)) {
+			ret = try_to_unmap_one_hugetlb(folio, vma, range,
+							pvmw, address, flags);
+
+			/* no need to loop for hugetlb */
+			page_vma_mapped_walk_done(&pvmw);
+			break;
+		}
+
 		subpage = folio_page(folio,
 					pte_pfn(*pvmw.pte) - folio_pfn(folio));
-		address = pvmw.address;
 		anon_exclusive = folio_test_anon(folio) &&
 				 PageAnonExclusive(subpage);
 
-		if (folio_test_hugetlb(folio)) {
-			bool anon = folio_test_anon(folio);
-
-			/*
-			 * The try_to_unmap() is only passed a hugetlb page
-			 * in the case where the hugetlb page is poisoned.
-			 */
-			VM_BUG_ON_PAGE(!PageHWPoison(subpage), subpage);
+		flush_cache_page(vma, address, pte_pfn(*pvmw.pte));
+		/* Nuke the page table entry. */
+		if (should_defer_flush(mm, flags)) {
 			/*
-			 * huge_pmd_unshare may unmap an entire PMD page.
-			 * There is no way of knowing exactly which PMDs may
-			 * be cached for this mm, so we must flush them all.
-			 * start/end were already adjusted above to cover this
-			 * range.
+			 * We clear the PTE but do not flush so potentially
+			 * a remote CPU could still be writing to the folio.
+			 * If the entry was previously clean then the
+			 * architecture must guarantee that a clear->dirty
+			 * transition on a cached TLB entry is written through
+			 * and traps if the PTE is unmapped.
 			 */
-			flush_cache_range(vma, range.start, range.end);
+			pteval = ptep_get_and_clear(mm, address, pvmw.pte);
 
-			/*
-			 * To call huge_pmd_unshare, i_mmap_rwsem must be
-			 * held in write mode.  Caller needs to explicitly
-			 * do this outside rmap routines.
-			 *
-			 * We also must hold hugetlb vma_lock in write mode.
-			 * Lock order dictates acquiring vma_lock BEFORE
-			 * i_mmap_rwsem.  We can only try lock here and fail
-			 * if unsuccessful.
-			 */
-			if (!anon) {
-				VM_BUG_ON(!(flags & TTU_RMAP_LOCKED));
-				if (!hugetlb_vma_trylock_write(vma)) {
-					page_vma_mapped_walk_done(&pvmw);
-					ret = false;
-					break;
-				}
-				if (huge_pmd_unshare(mm, vma, address, pvmw.pte)) {
-					hugetlb_vma_unlock_write(vma);
-					flush_tlb_range(vma,
-						range.start, range.end);
-					mmu_notifier_invalidate_range(mm,
-						range.start, range.end);
-					/*
-					 * The ref count of the PMD page was
-					 * dropped which is part of the way map
-					 * counting is done for shared PMDs.
-					 * Return 'true' here.  When there is
-					 * no other sharing, huge_pmd_unshare
-					 * returns false and we will unmap the
-					 * actual page and drop map count
-					 * to zero.
-					 */
-					page_vma_mapped_walk_done(&pvmw);
-					break;
-				}
-				hugetlb_vma_unlock_write(vma);
-			}
-			pteval = huge_ptep_clear_flush(vma, address, pvmw.pte);
+			set_tlb_ubc_flush_pending(mm, pte_dirty(pteval));
 		} else {
-			flush_cache_page(vma, address, pte_pfn(*pvmw.pte));
-			/* Nuke the page table entry. */
-			if (should_defer_flush(mm, flags)) {
-				/*
-				 * We clear the PTE but do not flush so potentially
-				 * a remote CPU could still be writing to the folio.
-				 * If the entry was previously clean then the
-				 * architecture must guarantee that a clear->dirty
-				 * transition on a cached TLB entry is written through
-				 * and traps if the PTE is unmapped.
-				 */
-				pteval = ptep_get_and_clear(mm, address, pvmw.pte);
-
-				set_tlb_ubc_flush_pending(mm, pte_dirty(pteval));
-			} else {
-				pteval = ptep_clear_flush(vma, address, pvmw.pte);
-			}
+			pteval = ptep_clear_flush(vma, address, pvmw.pte);
 		}
 
 		/*
@@ -1604,14 +1652,8 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
 
 		if (PageHWPoison(subpage) && (flags & TTU_HWPOISON)) {
 			pteval = swp_entry_to_pte(make_hwpoison_entry(subpage));
-			if (folio_test_hugetlb(folio)) {
-				hugetlb_count_sub(folio_nr_pages(folio), mm);
-				set_huge_pte_at(mm, address, pvmw.pte, pteval);
-			} else {
-				dec_mm_counter(mm, mm_counter(&folio->page));
-				set_pte_at(mm, address, pvmw.pte, pteval);
-			}
-
+			dec_mm_counter(mm, mm_counter(&folio->page));
+			set_pte_at(mm, address, pvmw.pte, pteval);
 		} else if (pte_unused(pteval) && !userfaultfd_armed(vma)) {
 			/*
 			 * The guest indicated that the page content is of no
-- 
2.30.2



  reply	other threads:[~2023-02-28 12:22 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-02-28 12:23 [PATCH v2 0/5] batched remove rmap in try_to_unmap_one() Yin Fengwei
2023-02-28 12:23 ` Yin Fengwei [this message]
2023-02-28 12:23 ` [PATCH v2 2/5] rmap: move page unmap operation to dedicated function Yin Fengwei
2023-02-28 12:23 ` [PATCH v2 3/5] rmap: cleanup exit path of try_to_unmap_one_page() Yin Fengwei
2023-02-28 12:23 ` [PATCH v2 4/5] rmap:addd folio_remove_rmap_range() Yin Fengwei
2023-02-28 12:23 ` [PATCH v2 5/5] try_to_unmap_one: batched remove rmap, update folio refcount Yin Fengwei
2023-02-28 20:28 ` [PATCH v2 0/5] batched remove rmap in try_to_unmap_one() Andrew Morton
2023-03-01  1:44   ` Yin, Fengwei
2023-03-02 10:04     ` David Hildenbrand
2023-03-02 13:32       ` Yin, Fengwei
2023-03-02 14:23         ` David Hildenbrand
2023-03-02 14:33           ` Matthew Wilcox
2023-03-02 14:55             ` David Hildenbrand
2023-03-03  2:44               ` Yin, Fengwei
2023-03-03  2:26           ` Yin, Fengwei
2023-03-06  9:11             ` Yin Fengwei

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230228122308.2972219-2-fengwei.yin@intel.com \
    --to=fengwei.yin@intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=jane.chu@oracle.com \
    --cc=linux-mm@kvack.org \
    --cc=mike.kravetz@oracle.com \
    --cc=naoya.horiguchi@nec.com \
    --cc=sidhartha.kumar@oracle.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox