From: Yin Fengwei <fengwei.yin@intel.com>
To: linux-mm@kvack.org, akpm@linux-foundation.org,
willy@infradead.org, sidhartha.kumar@oracle.com,
mike.kravetz@oracle.com, jane.chu@oracle.com,
naoya.horiguchi@nec.com
Cc: fengwei.yin@intel.com
Subject: [PATCH v2 1/5] rmap: move hugetlb try_to_unmap to dedicated function
Date: Tue, 28 Feb 2023 20:23:04 +0800 [thread overview]
Message-ID: <20230228122308.2972219-2-fengwei.yin@intel.com> (raw)
In-Reply-To: <20230228122308.2972219-1-fengwei.yin@intel.com>
It's to prepare the batched rmap update for large folio.
No need to looped handle hugetlb. Just handle hugetlb and
bail out early.
Signed-off-by: Yin Fengwei <fengwei.yin@intel.com>
---
mm/rmap.c | 200 +++++++++++++++++++++++++++++++++---------------------
1 file changed, 121 insertions(+), 79 deletions(-)
diff --git a/mm/rmap.c b/mm/rmap.c
index 8632e02661ac..0f09518d6f30 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1443,6 +1443,103 @@ void page_remove_rmap(struct page *page, struct vm_area_struct *vma,
munlock_vma_folio(folio, vma, compound);
}
+static bool try_to_unmap_one_hugetlb(struct folio *folio,
+ struct vm_area_struct *vma, struct mmu_notifier_range range,
+ struct page_vma_mapped_walk pvmw, unsigned long address,
+ enum ttu_flags flags)
+{
+ struct mm_struct *mm = vma->vm_mm;
+ pte_t pteval;
+ bool ret = true, anon = folio_test_anon(folio);
+
+ /*
+ * The try_to_unmap() is only passed a hugetlb page
+ * in the case where the hugetlb page is poisoned.
+ */
+ VM_BUG_ON_FOLIO(!folio_test_hwpoison(folio), folio);
+ /*
+ * huge_pmd_unshare may unmap an entire PMD page.
+ * There is no way of knowing exactly which PMDs may
+ * be cached for this mm, so we must flush them all.
+ * start/end were already adjusted above to cover this
+ * range.
+ */
+ flush_cache_range(vma, range.start, range.end);
+
+ /*
+ * To call huge_pmd_unshare, i_mmap_rwsem must be
+ * held in write mode. Caller needs to explicitly
+ * do this outside rmap routines.
+ *
+ * We also must hold hugetlb vma_lock in write mode.
+ * Lock order dictates acquiring vma_lock BEFORE
+ * i_mmap_rwsem. We can only try lock here and fail
+ * if unsuccessful.
+ */
+ if (!anon) {
+ VM_BUG_ON(!(flags & TTU_RMAP_LOCKED));
+ if (!hugetlb_vma_trylock_write(vma)) {
+ ret = false;
+ goto out;
+ }
+ if (huge_pmd_unshare(mm, vma, address, pvmw.pte)) {
+ hugetlb_vma_unlock_write(vma);
+ flush_tlb_range(vma,
+ range.start, range.end);
+ mmu_notifier_invalidate_range(mm,
+ range.start, range.end);
+ /*
+ * The ref count of the PMD page was
+ * dropped which is part of the way map
+ * counting is done for shared PMDs.
+ * Return 'true' here. When there is
+ * no other sharing, huge_pmd_unshare
+ * returns false and we will unmap the
+ * actual page and drop map count
+ * to zero.
+ */
+ goto out;
+ }
+ hugetlb_vma_unlock_write(vma);
+ }
+ pteval = huge_ptep_clear_flush(vma, address, pvmw.pte);
+
+ /*
+ * Now the pte is cleared. If this pte was uffd-wp armed,
+ * we may want to replace a none pte with a marker pte if
+ * it's file-backed, so we don't lose the tracking info.
+ */
+ pte_install_uffd_wp_if_needed(vma, address, pvmw.pte, pteval);
+
+ /* Set the dirty flag on the folio now the pte is gone. */
+ if (pte_dirty(pteval))
+ folio_mark_dirty(folio);
+
+ /* Update high watermark before we lower rss */
+ update_hiwater_rss(mm);
+
+ /* Poisoned hugetlb folio with TTU_HWPOISON always cleared in flags */
+ pteval = swp_entry_to_pte(make_hwpoison_entry(&folio->page));
+ set_huge_pte_at(mm, address, pvmw.pte, pteval);
+ hugetlb_count_sub(folio_nr_pages(folio), mm);
+
+ /*
+ * No need to call mmu_notifier_invalidate_range() it has be
+ * done above for all cases requiring it to happen under page
+ * table lock before mmu_notifier_invalidate_range_end()
+ *
+ * See Documentation/mm/mmu_notifier.rst
+ */
+ page_remove_rmap(&folio->page, vma, folio_test_hugetlb(folio));
+ /* No VM_LOCKED set in vma->vm_flags for hugetlb. So not
+ * necessary to call mlock_drain_local().
+ */
+ folio_put(folio);
+
+out:
+ return ret;
+}
+
/*
* @arg: enum ttu_flags will be passed to this argument
*/
@@ -1506,86 +1603,37 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
break;
}
+ address = pvmw.address;
+ if (folio_test_hugetlb(folio)) {
+ ret = try_to_unmap_one_hugetlb(folio, vma, range,
+ pvmw, address, flags);
+
+ /* no need to loop for hugetlb */
+ page_vma_mapped_walk_done(&pvmw);
+ break;
+ }
+
subpage = folio_page(folio,
pte_pfn(*pvmw.pte) - folio_pfn(folio));
- address = pvmw.address;
anon_exclusive = folio_test_anon(folio) &&
PageAnonExclusive(subpage);
- if (folio_test_hugetlb(folio)) {
- bool anon = folio_test_anon(folio);
-
- /*
- * The try_to_unmap() is only passed a hugetlb page
- * in the case where the hugetlb page is poisoned.
- */
- VM_BUG_ON_PAGE(!PageHWPoison(subpage), subpage);
+ flush_cache_page(vma, address, pte_pfn(*pvmw.pte));
+ /* Nuke the page table entry. */
+ if (should_defer_flush(mm, flags)) {
/*
- * huge_pmd_unshare may unmap an entire PMD page.
- * There is no way of knowing exactly which PMDs may
- * be cached for this mm, so we must flush them all.
- * start/end were already adjusted above to cover this
- * range.
+ * We clear the PTE but do not flush so potentially
+ * a remote CPU could still be writing to the folio.
+ * If the entry was previously clean then the
+ * architecture must guarantee that a clear->dirty
+ * transition on a cached TLB entry is written through
+ * and traps if the PTE is unmapped.
*/
- flush_cache_range(vma, range.start, range.end);
+ pteval = ptep_get_and_clear(mm, address, pvmw.pte);
- /*
- * To call huge_pmd_unshare, i_mmap_rwsem must be
- * held in write mode. Caller needs to explicitly
- * do this outside rmap routines.
- *
- * We also must hold hugetlb vma_lock in write mode.
- * Lock order dictates acquiring vma_lock BEFORE
- * i_mmap_rwsem. We can only try lock here and fail
- * if unsuccessful.
- */
- if (!anon) {
- VM_BUG_ON(!(flags & TTU_RMAP_LOCKED));
- if (!hugetlb_vma_trylock_write(vma)) {
- page_vma_mapped_walk_done(&pvmw);
- ret = false;
- break;
- }
- if (huge_pmd_unshare(mm, vma, address, pvmw.pte)) {
- hugetlb_vma_unlock_write(vma);
- flush_tlb_range(vma,
- range.start, range.end);
- mmu_notifier_invalidate_range(mm,
- range.start, range.end);
- /*
- * The ref count of the PMD page was
- * dropped which is part of the way map
- * counting is done for shared PMDs.
- * Return 'true' here. When there is
- * no other sharing, huge_pmd_unshare
- * returns false and we will unmap the
- * actual page and drop map count
- * to zero.
- */
- page_vma_mapped_walk_done(&pvmw);
- break;
- }
- hugetlb_vma_unlock_write(vma);
- }
- pteval = huge_ptep_clear_flush(vma, address, pvmw.pte);
+ set_tlb_ubc_flush_pending(mm, pte_dirty(pteval));
} else {
- flush_cache_page(vma, address, pte_pfn(*pvmw.pte));
- /* Nuke the page table entry. */
- if (should_defer_flush(mm, flags)) {
- /*
- * We clear the PTE but do not flush so potentially
- * a remote CPU could still be writing to the folio.
- * If the entry was previously clean then the
- * architecture must guarantee that a clear->dirty
- * transition on a cached TLB entry is written through
- * and traps if the PTE is unmapped.
- */
- pteval = ptep_get_and_clear(mm, address, pvmw.pte);
-
- set_tlb_ubc_flush_pending(mm, pte_dirty(pteval));
- } else {
- pteval = ptep_clear_flush(vma, address, pvmw.pte);
- }
+ pteval = ptep_clear_flush(vma, address, pvmw.pte);
}
/*
@@ -1604,14 +1652,8 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
if (PageHWPoison(subpage) && (flags & TTU_HWPOISON)) {
pteval = swp_entry_to_pte(make_hwpoison_entry(subpage));
- if (folio_test_hugetlb(folio)) {
- hugetlb_count_sub(folio_nr_pages(folio), mm);
- set_huge_pte_at(mm, address, pvmw.pte, pteval);
- } else {
- dec_mm_counter(mm, mm_counter(&folio->page));
- set_pte_at(mm, address, pvmw.pte, pteval);
- }
-
+ dec_mm_counter(mm, mm_counter(&folio->page));
+ set_pte_at(mm, address, pvmw.pte, pteval);
} else if (pte_unused(pteval) && !userfaultfd_armed(vma)) {
/*
* The guest indicated that the page content is of no
--
2.30.2
next prev parent reply other threads:[~2023-02-28 12:22 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-02-28 12:23 [PATCH v2 0/5] batched remove rmap in try_to_unmap_one() Yin Fengwei
2023-02-28 12:23 ` Yin Fengwei [this message]
2023-02-28 12:23 ` [PATCH v2 2/5] rmap: move page unmap operation to dedicated function Yin Fengwei
2023-02-28 12:23 ` [PATCH v2 3/5] rmap: cleanup exit path of try_to_unmap_one_page() Yin Fengwei
2023-02-28 12:23 ` [PATCH v2 4/5] rmap:addd folio_remove_rmap_range() Yin Fengwei
2023-02-28 12:23 ` [PATCH v2 5/5] try_to_unmap_one: batched remove rmap, update folio refcount Yin Fengwei
2023-02-28 20:28 ` [PATCH v2 0/5] batched remove rmap in try_to_unmap_one() Andrew Morton
2023-03-01 1:44 ` Yin, Fengwei
2023-03-02 10:04 ` David Hildenbrand
2023-03-02 13:32 ` Yin, Fengwei
2023-03-02 14:23 ` David Hildenbrand
2023-03-02 14:33 ` Matthew Wilcox
2023-03-02 14:55 ` David Hildenbrand
2023-03-03 2:44 ` Yin, Fengwei
2023-03-03 2:26 ` Yin, Fengwei
2023-03-06 9:11 ` Yin Fengwei
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230228122308.2972219-2-fengwei.yin@intel.com \
--to=fengwei.yin@intel.com \
--cc=akpm@linux-foundation.org \
--cc=jane.chu@oracle.com \
--cc=linux-mm@kvack.org \
--cc=mike.kravetz@oracle.com \
--cc=naoya.horiguchi@nec.com \
--cc=sidhartha.kumar@oracle.com \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox