linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v8 0/3] Reclaim lazyfree THP without splitting
@ 2024-06-14  1:51 Lance Yang
  2024-06-14  1:51 ` [PATCH v8 1/3] mm/rmap: remove duplicated exit code in pagewalk loop Lance Yang
                   ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Lance Yang @ 2024-06-14  1:51 UTC (permalink / raw)
  To: akpm
  Cc: willy, sj, baolin.wang, maskray, ziy, ryan.roberts, david,
	21cnbao, mhocko, fengwei.yin, zokeefe, shy828301, xiehuan09,
	libang.li, wangkefeng.wang, songmuchun, peterx, minchan,
	linux-mm, linux-kernel, Lance Yang

Hi all,

This series adds support for reclaiming PMD-mapped THP marked as lazyfree
without needing to first split the large folio via split_huge_pmd_address().

When the user no longer requires the pages, they would use madvise(MADV_FREE)
to mark the pages as lazy free. Subsequently, they typically would not re-write
to that memory again.

During memory reclaim, if we detect that the large folio and its PMD are both
still marked as clean and there are no unexpected references(such as GUP), so we
can just discard the memory lazily, improving the efficiency of memory
reclamation in this case.

Performance Testing
===================

On an Intel i5 CPU, reclaiming 1GiB of lazyfree THPs using
mem_cgroup_force_empty() results in the following runtimes in seconds
(shorter is better):

--------------------------------------------
|     Old       |      New       |  Change  |
--------------------------------------------
|   0.683426    |    0.049197    |  -92.80% |
--------------------------------------------

---

Changes since v7 [7]
====================
 - mm/rmap: remove duplicated exit code in pagewalk loop
    - Pick RB from Barry - thanks!
    - Rename walk_done_err to walk_abort (per Barry and David)
 - mm/rmap: integrate PMD-mapped folio splitting into pagewalk loop
    - Make page_vma_mapped_walk_restart() more general (per David)
    - Squash page_vma_mapped_walk_restart() into this patch (per David)
 - mm/vmscan: avoid split lazyfree THP during shrink_folio_list()
    - Don't unmark a PMD-mapped folio as lazyfree in unmap_huge_pmd_locked()
    - Drop the unused "pmd_mapped" variable (per Baolin)

Changes since v6 [6]
====================
 - mm/rmap: remove duplicated exit code in pagewalk loop
    - Pick RB from David - thanks!
 - mm/rmap: add helper to restart pgtable walk on changes
    - Add the page_vma_mapped_walk_restart() helper to handle scenarios
      where the page table walk needs to be restarted due to changes in
      the page table (suggested by David)
 - mm/rmap: integrate PMD-mapped folio splitting into pagewalk loop
    - Pass 'pvmw.address' to split_huge_pmd_locked() (per David)
    - Drop the check for PMD-mapped THP that is missing the mlock (per David)
 - mm/vmscan: avoid split lazyfree THP during shrink_folio_list()
    - Rename the function __discard_trans_pmd_locked() to
      __discard_anon_folio_pmd_locked() (per David)

Changes since v5 [5]
====================
 - mm/rmap: remove duplicated exit code in pagewalk loop
    - Pick RB from Baolin Wang - thanks!
 - mm/mlock: check for THP missing the mlock in try_to_unmap_one()
    - Merge this patch into patch 2 (per Baolin Wang)
 - mm/vmscan: avoid split lazyfree THP during shrink_folio_list()
    - Mark a folio as being backed by swap space if the folio or its PMD
      was redirtied (per Baolin Wang)
    - Use pmdp_huge_clear_flush() to get and flush a PMD entry
      (per Baolin Wang)

Changes since v4 [4]
====================
 - mm/rmap: remove duplicated exit code in pagewalk loop
    - Pick RB from Zi Yan - thanks!
 - mm/rmap: integrate PMD-mapped folio splitting into pagewalk loop
    - Remove the redundant alignment (per Baolin Wang)
    - Set pvmw.ptl to NULL after unlocking the PTL (per Baolin Wang)
 - mm/mlock: check for THP missing the mlock in try_to_unmap_one()
    - Check whether the mlock of PMD-mapped THP was missed
      (suggested by Baolin Wang)
 - mm/vmscan: avoid split lazyfree THP during shrink_folio_list()
    - No need to check the TTU_SPLIT_HUGE_PMD flag for unmap_huge_pmd_locked()
      (per Zi Yan)
    - Drain the local mlock batch after folio_remove_rmap_pmd()
      (per Baolin Wang)

Changes since v3 [3]
====================
 - mm/rmap: integrate PMD-mapped folio splitting into pagewalk loop
    - Resolve compilation errors by handling the case where
      CONFIG_PGTABLE_HAS_HUGE_LEAVES is undefined (thanks to SeongJae Park)
 - mm/vmscan: avoid split lazyfree THP during shrink_folio_list()
    - Remove the unnecessary conditional compilation directives
      (thanks to Barry Song)
    - Resolve compilation errors due to undefined references to
      unmap_huge_pmd_locked and split_huge_pmd_locked (thanks to Barry)

Changes since v2 [2]
====================
 - Update the changelog (thanks to David Hildenbrand)
 - Support try_to_unmap_one() to unmap PMD-mapped folios
   (thanks a lot to David Hildenbrand and Zi Yan)

Changes since v1 [1]
====================
 - Update the changelog
 - Follow the exact same logic as in try_to_unmap_one() (per David Hildenbrand)
 - Remove the extra code from rmap.c (per Matthew Wilcox)

[1] https://lore.kernel.org/linux-mm/20240417141111.77855-1-ioworker0@gmail.com
[2] https://lore.kernel.org/linux-mm/20240422055213.60231-1-ioworker0@gmail.com
[3] https://lore.kernel.org/linux-mm/20240429132308.38794-1-ioworker0@gmail.com
[4] https://lore.kernel.org/linux-mm/20240501042700.83974-1-ioworker0@gmail.com
[5] https://lore.kernel.org/linux-mm/20240513074712.7608-1-ioworker0@gmail.com
[6] https://lore.kernel.org/linux-mm/20240521040244.48760-1-ioworker0@gmail.com
[7] https://lore.kernel.org/linux-mm/20240610120209.66311-1-ioworker0@gmail.com

Lance Yang (3):
  mm/rmap: remove duplicated exit code in pagewalk loop
  mm/rmap: integrate PMD-mapped folio splitting into pagewalk loop
  mm/vmscan: avoid split lazyfree THP during shrink_folio_list()

 include/linux/huge_mm.h |  15 +++++
 include/linux/rmap.h    |  24 ++++++++
 mm/huge_memory.c        | 118 +++++++++++++++++++++++++++++++++-------
 mm/rmap.c               |  68 ++++++++++++-----------
 4 files changed, 174 insertions(+), 51 deletions(-)


base-commit: fb8d20fa1a94f807336ed209d33da8ec15ae6c3a
-- 
2.33.1



^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v8 1/3] mm/rmap: remove duplicated exit code in pagewalk loop
  2024-06-14  1:51 [PATCH v8 0/3] Reclaim lazyfree THP without splitting Lance Yang
@ 2024-06-14  1:51 ` Lance Yang
  2024-06-14  1:51 ` [PATCH v8 2/3] mm/rmap: integrate PMD-mapped folio splitting into " Lance Yang
  2024-06-14  1:51 ` [PATCH v8 3/3] mm/vmscan: avoid split lazyfree THP during shrink_folio_list() Lance Yang
  2 siblings, 0 replies; 11+ messages in thread
From: Lance Yang @ 2024-06-14  1:51 UTC (permalink / raw)
  To: akpm
  Cc: willy, sj, baolin.wang, maskray, ziy, ryan.roberts, david,
	21cnbao, mhocko, fengwei.yin, zokeefe, shy828301, xiehuan09,
	libang.li, wangkefeng.wang, songmuchun, peterx, minchan,
	linux-mm, linux-kernel, Lance Yang, Barry Song

Introduce the labels walk_done and walk_abort as exit points to eliminate
duplicated exit code in the pagewalk loop.

Reviewed-by: Zi Yan <ziy@nvidia.com>
Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Barry Song <baohua@kernel.org>
Signed-off-by: Lance Yang <ioworker0@gmail.com>
---
 mm/rmap.c | 40 +++++++++++++++-------------------------
 1 file changed, 15 insertions(+), 25 deletions(-)

diff --git a/mm/rmap.c b/mm/rmap.c
index ae250b2b4d55..2d778725e4f5 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1681,9 +1681,7 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
 			/* Restore the mlock which got missed */
 			if (!folio_test_large(folio))
 				mlock_vma_folio(folio, vma);
-			page_vma_mapped_walk_done(&pvmw);
-			ret = false;
-			break;
+			goto walk_abort;
 		}
 
 		pfn = pte_pfn(ptep_get(pvmw.pte));
@@ -1721,11 +1719,8 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
 			 */
 			if (!anon) {
 				VM_BUG_ON(!(flags & TTU_RMAP_LOCKED));
-				if (!hugetlb_vma_trylock_write(vma)) {
-					page_vma_mapped_walk_done(&pvmw);
-					ret = false;
-					break;
-				}
+				if (!hugetlb_vma_trylock_write(vma))
+					goto walk_abort;
 				if (huge_pmd_unshare(mm, vma, address, pvmw.pte)) {
 					hugetlb_vma_unlock_write(vma);
 					flush_tlb_range(vma,
@@ -1740,8 +1735,7 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
 					 * actual page and drop map count
 					 * to zero.
 					 */
-					page_vma_mapped_walk_done(&pvmw);
-					break;
+					goto walk_done;
 				}
 				hugetlb_vma_unlock_write(vma);
 			}
@@ -1813,9 +1807,7 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
 			if (unlikely(folio_test_swapbacked(folio) !=
 					folio_test_swapcache(folio))) {
 				WARN_ON_ONCE(1);
-				ret = false;
-				page_vma_mapped_walk_done(&pvmw);
-				break;
+				goto walk_abort;
 			}
 
 			/* MADV_FREE page check */
@@ -1854,23 +1846,17 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
 				 */
 				set_pte_at(mm, address, pvmw.pte, pteval);
 				folio_set_swapbacked(folio);
-				ret = false;
-				page_vma_mapped_walk_done(&pvmw);
-				break;
+				goto walk_abort;
 			}
 
 			if (swap_duplicate(entry) < 0) {
 				set_pte_at(mm, address, pvmw.pte, pteval);
-				ret = false;
-				page_vma_mapped_walk_done(&pvmw);
-				break;
+				goto walk_abort;
 			}
 			if (arch_unmap_one(mm, vma, address, pteval) < 0) {
 				swap_free(entry);
 				set_pte_at(mm, address, pvmw.pte, pteval);
-				ret = false;
-				page_vma_mapped_walk_done(&pvmw);
-				break;
+				goto walk_abort;
 			}
 
 			/* See folio_try_share_anon_rmap(): clear PTE first. */
@@ -1878,9 +1864,7 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
 			    folio_try_share_anon_rmap_pte(folio, subpage)) {
 				swap_free(entry);
 				set_pte_at(mm, address, pvmw.pte, pteval);
-				ret = false;
-				page_vma_mapped_walk_done(&pvmw);
-				break;
+				goto walk_abort;
 			}
 			if (list_empty(&mm->mmlist)) {
 				spin_lock(&mmlist_lock);
@@ -1920,6 +1904,12 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
 		if (vma->vm_flags & VM_LOCKED)
 			mlock_drain_local();
 		folio_put(folio);
+		continue;
+walk_abort:
+		ret = false;
+walk_done:
+		page_vma_mapped_walk_done(&pvmw);
+		break;
 	}
 
 	mmu_notifier_invalidate_range_end(&range);
-- 
2.33.1



^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v8 2/3] mm/rmap: integrate PMD-mapped folio splitting into pagewalk loop
  2024-06-14  1:51 [PATCH v8 0/3] Reclaim lazyfree THP without splitting Lance Yang
  2024-06-14  1:51 ` [PATCH v8 1/3] mm/rmap: remove duplicated exit code in pagewalk loop Lance Yang
@ 2024-06-14  1:51 ` Lance Yang
  2024-06-14  7:34   ` David Hildenbrand
  2024-06-14 14:26   ` Zi Yan
  2024-06-14  1:51 ` [PATCH v8 3/3] mm/vmscan: avoid split lazyfree THP during shrink_folio_list() Lance Yang
  2 siblings, 2 replies; 11+ messages in thread
From: Lance Yang @ 2024-06-14  1:51 UTC (permalink / raw)
  To: akpm
  Cc: willy, sj, baolin.wang, maskray, ziy, ryan.roberts, david,
	21cnbao, mhocko, fengwei.yin, zokeefe, shy828301, xiehuan09,
	libang.li, wangkefeng.wang, songmuchun, peterx, minchan,
	linux-mm, linux-kernel, Lance Yang

In preparation for supporting try_to_unmap_one() to unmap PMD-mapped
folios, start the pagewalk first, then call split_huge_pmd_address() to
split the folio.

Suggested-by: David Hildenbrand <david@redhat.com>
Suggested-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Signed-off-by: Lance Yang <ioworker0@gmail.com>
---
 include/linux/huge_mm.h |  6 ++++++
 include/linux/rmap.h    | 24 +++++++++++++++++++++++
 mm/huge_memory.c        | 42 +++++++++++++++++++++--------------------
 mm/rmap.c               | 21 +++++++++++++++------
 4 files changed, 67 insertions(+), 26 deletions(-)

diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
index 7ad41de5eaea..9f720b0731c4 100644
--- a/include/linux/huge_mm.h
+++ b/include/linux/huge_mm.h
@@ -428,6 +428,9 @@ static inline bool thp_migration_supported(void)
 	return IS_ENABLED(CONFIG_ARCH_ENABLE_THP_MIGRATION);
 }
 
+void split_huge_pmd_locked(struct vm_area_struct *vma, unsigned long address,
+			   pmd_t *pmd, bool freeze, struct folio *folio);
+
 #else /* CONFIG_TRANSPARENT_HUGEPAGE */
 
 static inline bool folio_test_pmd_mappable(struct folio *folio)
@@ -490,6 +493,9 @@ static inline void __split_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd,
 		unsigned long address, bool freeze, struct folio *folio) {}
 static inline void split_huge_pmd_address(struct vm_area_struct *vma,
 		unsigned long address, bool freeze, struct folio *folio) {}
+static inline void split_huge_pmd_locked(struct vm_area_struct *vma,
+					 unsigned long address, pmd_t *pmd,
+					 bool freeze, struct folio *folio) {}
 
 #define split_huge_pud(__vma, __pmd, __address)	\
 	do { } while (0)
diff --git a/include/linux/rmap.h b/include/linux/rmap.h
index 0fd9bebce54c..d1c5e2d694b2 100644
--- a/include/linux/rmap.h
+++ b/include/linux/rmap.h
@@ -703,6 +703,30 @@ static inline void page_vma_mapped_walk_done(struct page_vma_mapped_walk *pvmw)
 		spin_unlock(pvmw->ptl);
 }
 
+/**
+ * page_vma_mapped_walk_restart - Restart the page table walk.
+ * @pvmw: Pointer to struct page_vma_mapped_walk.
+ *
+ * It restarts the page table walk when changes occur in the page
+ * table, such as splitting a PMD. Ensures that the PTL held during
+ * the previous walk is released and resets the state to allow for
+ * a new walk starting at the current address stored in pvmw->address.
+ */
+static inline void
+page_vma_mapped_walk_restart(struct page_vma_mapped_walk *pvmw)
+{
+	WARN_ON_ONCE(!pvmw->pmd && !pvmw->pte);
+
+	if (likely(pvmw->ptl))
+		spin_unlock(pvmw->ptl);
+	else
+		WARN_ON_ONCE(1);
+
+	pvmw->ptl = NULL;
+	pvmw->pmd = NULL;
+	pvmw->pte = NULL;
+}
+
 bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw);
 
 /*
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 70d20fefc6db..e766d3f3a302 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2582,6 +2582,27 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd,
 	pmd_populate(mm, pmd, pgtable);
 }
 
+void split_huge_pmd_locked(struct vm_area_struct *vma, unsigned long address,
+			   pmd_t *pmd, bool freeze, struct folio *folio)
+{
+	VM_WARN_ON_ONCE(folio && !folio_test_pmd_mappable(folio));
+	VM_WARN_ON_ONCE(!IS_ALIGNED(address, HPAGE_PMD_SIZE));
+	VM_WARN_ON_ONCE(folio && !folio_test_locked(folio));
+	VM_BUG_ON(freeze && !folio);
+
+	/*
+	 * When the caller requests to set up a migration entry, we
+	 * require a folio to check the PMD against. Otherwise, there
+	 * is a risk of replacing the wrong folio.
+	 */
+	if (pmd_trans_huge(*pmd) || pmd_devmap(*pmd) ||
+	    is_pmd_migration_entry(*pmd)) {
+		if (folio && folio != pmd_folio(*pmd))
+			return;
+		__split_huge_pmd_locked(vma, pmd, address, freeze);
+	}
+}
+
 void __split_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd,
 		unsigned long address, bool freeze, struct folio *folio)
 {
@@ -2593,26 +2614,7 @@ void __split_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd,
 				(address & HPAGE_PMD_MASK) + HPAGE_PMD_SIZE);
 	mmu_notifier_invalidate_range_start(&range);
 	ptl = pmd_lock(vma->vm_mm, pmd);
-
-	/*
-	 * If caller asks to setup a migration entry, we need a folio to check
-	 * pmd against. Otherwise we can end up replacing wrong folio.
-	 */
-	VM_BUG_ON(freeze && !folio);
-	VM_WARN_ON_ONCE(folio && !folio_test_locked(folio));
-
-	if (pmd_trans_huge(*pmd) || pmd_devmap(*pmd) ||
-	    is_pmd_migration_entry(*pmd)) {
-		/*
-		 * It's safe to call pmd_page when folio is set because it's
-		 * guaranteed that pmd is present.
-		 */
-		if (folio && folio != pmd_folio(*pmd))
-			goto out;
-		__split_huge_pmd_locked(vma, pmd, range.start, freeze);
-	}
-
-out:
+	split_huge_pmd_locked(vma, range.start, pmd, freeze, folio);
 	spin_unlock(ptl);
 	mmu_notifier_invalidate_range_end(&range);
 }
diff --git a/mm/rmap.c b/mm/rmap.c
index 2d778725e4f5..dacf24bc82f0 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1642,9 +1642,6 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
 	if (flags & TTU_SYNC)
 		pvmw.flags = PVMW_SYNC;
 
-	if (flags & TTU_SPLIT_HUGE_PMD)
-		split_huge_pmd_address(vma, address, false, folio);
-
 	/*
 	 * For THP, we have to assume the worse case ie pmd for invalidation.
 	 * For hugetlb, it could be much worse if we need to do pud
@@ -1670,9 +1667,6 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
 	mmu_notifier_invalidate_range_start(&range);
 
 	while (page_vma_mapped_walk(&pvmw)) {
-		/* Unexpected PMD-mapped THP? */
-		VM_BUG_ON_FOLIO(!pvmw.pte, folio);
-
 		/*
 		 * If the folio is in an mlock()d vma, we must not swap it out.
 		 */
@@ -1684,6 +1678,21 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
 			goto walk_abort;
 		}
 
+		if (!pvmw.pte && (flags & TTU_SPLIT_HUGE_PMD)) {
+			/*
+			 * We temporarily have to drop the PTL and start once
+			 * again from that now-PTE-mapped page table.
+			 */
+			split_huge_pmd_locked(vma, pvmw.address, pvmw.pmd,
+					      false, folio);
+			flags &= ~TTU_SPLIT_HUGE_PMD;
+			page_vma_mapped_walk_restart(&pvmw);
+			continue;
+		}
+
+		/* Unexpected PMD-mapped THP? */
+		VM_BUG_ON_FOLIO(!pvmw.pte, folio);
+
 		pfn = pte_pfn(ptep_get(pvmw.pte));
 		subpage = folio_page(folio, pfn - folio_pfn(folio));
 		address = pvmw.address;
-- 
2.33.1



^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v8 3/3] mm/vmscan: avoid split lazyfree THP during shrink_folio_list()
  2024-06-14  1:51 [PATCH v8 0/3] Reclaim lazyfree THP without splitting Lance Yang
  2024-06-14  1:51 ` [PATCH v8 1/3] mm/rmap: remove duplicated exit code in pagewalk loop Lance Yang
  2024-06-14  1:51 ` [PATCH v8 2/3] mm/rmap: integrate PMD-mapped folio splitting into " Lance Yang
@ 2024-06-14  1:51 ` Lance Yang
  2024-06-17 18:04   ` David Hildenbrand
  2 siblings, 1 reply; 11+ messages in thread
From: Lance Yang @ 2024-06-14  1:51 UTC (permalink / raw)
  To: akpm
  Cc: willy, sj, baolin.wang, maskray, ziy, ryan.roberts, david,
	21cnbao, mhocko, fengwei.yin, zokeefe, shy828301, xiehuan09,
	libang.li, wangkefeng.wang, songmuchun, peterx, minchan,
	linux-mm, linux-kernel, Lance Yang

When the user no longer requires the pages, they would use
madvise(MADV_FREE) to mark the pages as lazy free. Subsequently, they
typically would not re-write to that memory again.

During memory reclaim, if we detect that the large folio and its PMD are
both still marked as clean and there are no unexpected references
(such as GUP), so we can just discard the memory lazily, improving the
efficiency of memory reclamation in this case.

On an Intel i5 CPU, reclaiming 1GiB of lazyfree THPs using
mem_cgroup_force_empty() results in the following runtimes in seconds
(shorter is better):

--------------------------------------------
|     Old       |      New       |  Change  |
--------------------------------------------
|   0.683426    |    0.049197    |  -92.80% |
--------------------------------------------

Suggested-by: Zi Yan <ziy@nvidia.com>
Suggested-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Lance Yang <ioworker0@gmail.com>
---
 include/linux/huge_mm.h |  9 +++++
 mm/huge_memory.c        | 76 +++++++++++++++++++++++++++++++++++++++++
 mm/rmap.c               | 27 +++++++++------
 3 files changed, 102 insertions(+), 10 deletions(-)

diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
index 9f720b0731c4..212cca384d7e 100644
--- a/include/linux/huge_mm.h
+++ b/include/linux/huge_mm.h
@@ -430,6 +430,8 @@ static inline bool thp_migration_supported(void)
 
 void split_huge_pmd_locked(struct vm_area_struct *vma, unsigned long address,
 			   pmd_t *pmd, bool freeze, struct folio *folio);
+bool unmap_huge_pmd_locked(struct vm_area_struct *vma, unsigned long addr,
+			   pmd_t *pmdp, struct folio *folio);
 
 #else /* CONFIG_TRANSPARENT_HUGEPAGE */
 
@@ -497,6 +499,13 @@ static inline void split_huge_pmd_locked(struct vm_area_struct *vma,
 					 unsigned long address, pmd_t *pmd,
 					 bool freeze, struct folio *folio) {}
 
+static inline bool unmap_huge_pmd_locked(struct vm_area_struct *vma,
+					 unsigned long addr, pmd_t *pmdp,
+					 struct folio *folio)
+{
+	return false;
+}
+
 #define split_huge_pud(__vma, __pmd, __address)	\
 	do { } while (0)
 
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index e766d3f3a302..425374ae06ed 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2688,6 +2688,82 @@ static void unmap_folio(struct folio *folio)
 	try_to_unmap_flush();
 }
 
+static bool __discard_anon_folio_pmd_locked(struct vm_area_struct *vma,
+					    unsigned long addr, pmd_t *pmdp,
+					    struct folio *folio)
+{
+	VM_WARN_ON_FOLIO(folio_test_swapbacked(folio), folio);
+	VM_WARN_ON_FOLIO(!folio_test_anon(folio), folio);
+
+	struct mm_struct *mm = vma->vm_mm;
+	int ref_count, map_count;
+	pmd_t orig_pmd = *pmdp;
+	struct page *page;
+
+	if (unlikely(!pmd_present(orig_pmd) || !pmd_trans_huge(orig_pmd)))
+		return false;
+
+	page = pmd_page(orig_pmd);
+	if (unlikely(page_folio(page) != folio))
+		return false;
+
+	if (folio_test_dirty(folio) || pmd_dirty(orig_pmd))
+		return false;
+
+	orig_pmd = pmdp_huge_clear_flush(vma, addr, pmdp);
+
+	/*
+	 * Syncing against concurrent GUP-fast:
+	 * - clear PMD; barrier; read refcount
+	 * - inc refcount; barrier; read PMD
+	 */
+	smp_mb();
+
+	ref_count = folio_ref_count(folio);
+	map_count = folio_mapcount(folio);
+
+	/*
+	 * Order reads for folio refcount and dirty flag
+	 * (see comments in __remove_mapping()).
+	 */
+	smp_rmb();
+
+	/*
+	 * If the folio or its PMD is redirtied at this point, or if there
+	 * are unexpected references, we will give up to discard this folio
+	 * and remap it.
+	 *
+	 * The only folio refs must be one from isolation plus the rmap(s).
+	 */
+	if (folio_test_dirty(folio) || pmd_dirty(orig_pmd) ||
+	    ref_count != map_count + 1) {
+		set_pmd_at(mm, addr, pmdp, orig_pmd);
+		return false;
+	}
+
+	folio_remove_rmap_pmd(folio, page, vma);
+	zap_deposited_table(mm, pmdp);
+	add_mm_counter(mm, MM_ANONPAGES, -HPAGE_PMD_NR);
+	if (vma->vm_flags & VM_LOCKED)
+		mlock_drain_local();
+	folio_put(folio);
+
+	return true;
+}
+
+bool unmap_huge_pmd_locked(struct vm_area_struct *vma, unsigned long addr,
+			   pmd_t *pmdp, struct folio *folio)
+{
+	VM_WARN_ON_FOLIO(!folio_test_pmd_mappable(folio), folio);
+	VM_WARN_ON_FOLIO(!folio_test_locked(folio), folio);
+	VM_WARN_ON_ONCE(!IS_ALIGNED(addr, HPAGE_PMD_SIZE));
+
+	if (folio_test_anon(folio) && !folio_test_swapbacked(folio))
+		return __discard_anon_folio_pmd_locked(vma, addr, pmdp, folio);
+
+	return false;
+}
+
 static void remap_page(struct folio *folio, unsigned long nr)
 {
 	int i = 0;
diff --git a/mm/rmap.c b/mm/rmap.c
index dacf24bc82f0..7d97806f74cd 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1678,16 +1678,23 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
 			goto walk_abort;
 		}
 
-		if (!pvmw.pte && (flags & TTU_SPLIT_HUGE_PMD)) {
-			/*
-			 * We temporarily have to drop the PTL and start once
-			 * again from that now-PTE-mapped page table.
-			 */
-			split_huge_pmd_locked(vma, pvmw.address, pvmw.pmd,
-					      false, folio);
-			flags &= ~TTU_SPLIT_HUGE_PMD;
-			page_vma_mapped_walk_restart(&pvmw);
-			continue;
+		if (!pvmw.pte) {
+			if (unmap_huge_pmd_locked(vma, pvmw.address, pvmw.pmd,
+						  folio))
+				goto walk_done;
+
+			if (flags & TTU_SPLIT_HUGE_PMD) {
+				/*
+				 * We temporarily have to drop the PTL and start
+				 * once again from that now-PTE-mapped page
+				 * table.
+				 */
+				split_huge_pmd_locked(vma, pvmw.address,
+						      pvmw.pmd, false, folio);
+				flags &= ~TTU_SPLIT_HUGE_PMD;
+				page_vma_mapped_walk_restart(&pvmw);
+				continue;
+			}
 		}
 
 		/* Unexpected PMD-mapped THP? */
-- 
2.33.1



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v8 2/3] mm/rmap: integrate PMD-mapped folio splitting into pagewalk loop
  2024-06-14  1:51 ` [PATCH v8 2/3] mm/rmap: integrate PMD-mapped folio splitting into " Lance Yang
@ 2024-06-14  7:34   ` David Hildenbrand
  2024-06-14  7:46     ` Lance Yang
  2024-06-14 14:26   ` Zi Yan
  1 sibling, 1 reply; 11+ messages in thread
From: David Hildenbrand @ 2024-06-14  7:34 UTC (permalink / raw)
  To: Lance Yang, akpm
  Cc: willy, sj, baolin.wang, maskray, ziy, ryan.roberts, 21cnbao,
	mhocko, fengwei.yin, zokeefe, shy828301, xiehuan09, libang.li,
	wangkefeng.wang, songmuchun, peterx, minchan, linux-mm,
	linux-kernel

On 14.06.24 03:51, Lance Yang wrote:
> In preparation for supporting try_to_unmap_one() to unmap PMD-mapped
> folios, start the pagewalk first, then call split_huge_pmd_address() to
> split the folio.
> 
> Suggested-by: David Hildenbrand <david@redhat.com>
> Suggested-by: Baolin Wang <baolin.wang@linux.alibaba.com>
> Signed-off-by: Lance Yang <ioworker0@gmail.com>
> ---

Would have converted that VM_BUG_ON to a VM_WARN_ON_ONCE, but it's just 
moving code, so no big deal.

Thanks!

Acked-by: David Hildenbrand <david@redhat.com>

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v8 2/3] mm/rmap: integrate PMD-mapped folio splitting into pagewalk loop
  2024-06-14  7:34   ` David Hildenbrand
@ 2024-06-14  7:46     ` Lance Yang
  0 siblings, 0 replies; 11+ messages in thread
From: Lance Yang @ 2024-06-14  7:46 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: akpm, willy, sj, baolin.wang, maskray, ziy, ryan.roberts,
	21cnbao, mhocko, fengwei.yin, zokeefe, shy828301, xiehuan09,
	libang.li, wangkefeng.wang, songmuchun, peterx, minchan,
	linux-mm, linux-kernel

On Fri, Jun 14, 2024 at 3:34 PM David Hildenbrand <david@redhat.com> wrote:
>
> On 14.06.24 03:51, Lance Yang wrote:
> > In preparation for supporting try_to_unmap_one() to unmap PMD-mapped
> > folios, start the pagewalk first, then call split_huge_pmd_address() to
> > split the folio.
> >
> > Suggested-by: David Hildenbrand <david@redhat.com>
> > Suggested-by: Baolin Wang <baolin.wang@linux.alibaba.com>
> > Signed-off-by: Lance Yang <ioworker0@gmail.com>
> > ---
>
> Would have converted that VM_BUG_ON to a VM_WARN_ON_ONCE, but it's just
> moving code, so no big deal.

OK, let’s leave it as is for now ;)

>
> Thanks!
>
> Acked-by: David Hildenbrand <david@redhat.com>

Thanks for taking time to review!
Lance

>
> --
> Cheers,
>
> David / dhildenb
>


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v8 2/3] mm/rmap: integrate PMD-mapped folio splitting into pagewalk loop
  2024-06-14  1:51 ` [PATCH v8 2/3] mm/rmap: integrate PMD-mapped folio splitting into " Lance Yang
  2024-06-14  7:34   ` David Hildenbrand
@ 2024-06-14 14:26   ` Zi Yan
  2024-06-14 14:41     ` Lance Yang
  1 sibling, 1 reply; 11+ messages in thread
From: Zi Yan @ 2024-06-14 14:26 UTC (permalink / raw)
  To: Lance Yang
  Cc: akpm, willy, sj, baolin.wang, maskray, ryan.roberts, david,
	21cnbao, mhocko, fengwei.yin, zokeefe, shy828301, xiehuan09,
	libang.li, wangkefeng.wang, songmuchun, peterx, minchan,
	linux-mm, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 727 bytes --]

On 13 Jun 2024, at 21:51, Lance Yang wrote:

> In preparation for supporting try_to_unmap_one() to unmap PMD-mapped
> folios, start the pagewalk first, then call split_huge_pmd_address() to
> split the folio.
>
> Suggested-by: David Hildenbrand <david@redhat.com>
> Suggested-by: Baolin Wang <baolin.wang@linux.alibaba.com>
> Signed-off-by: Lance Yang <ioworker0@gmail.com>
> ---
>  include/linux/huge_mm.h |  6 ++++++
>  include/linux/rmap.h    | 24 +++++++++++++++++++++++
>  mm/huge_memory.c        | 42 +++++++++++++++++++++--------------------
>  mm/rmap.c               | 21 +++++++++++++++------
>  4 files changed, 67 insertions(+), 26 deletions(-)

Thanks.

Acked-by: Zi Yan <ziy@nvidia.com>

--
Best Regards,
Yan, Zi

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 854 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v8 2/3] mm/rmap: integrate PMD-mapped folio splitting into pagewalk loop
  2024-06-14 14:26   ` Zi Yan
@ 2024-06-14 14:41     ` Lance Yang
  0 siblings, 0 replies; 11+ messages in thread
From: Lance Yang @ 2024-06-14 14:41 UTC (permalink / raw)
  To: Zi Yan
  Cc: akpm, willy, sj, baolin.wang, maskray, ryan.roberts, david,
	21cnbao, mhocko, fengwei.yin, zokeefe, shy828301, xiehuan09,
	libang.li, wangkefeng.wang, songmuchun, peterx, minchan,
	linux-mm, linux-kernel

On Fri, Jun 14, 2024 at 10:26 PM Zi Yan <ziy@nvidia.com> wrote:
>
> On 13 Jun 2024, at 21:51, Lance Yang wrote:
>
> > In preparation for supporting try_to_unmap_one() to unmap PMD-mapped
> > folios, start the pagewalk first, then call split_huge_pmd_address() to
> > split the folio.
> >
> > Suggested-by: David Hildenbrand <david@redhat.com>
> > Suggested-by: Baolin Wang <baolin.wang@linux.alibaba.com>
> > Signed-off-by: Lance Yang <ioworker0@gmail.com>
> > ---
> >  include/linux/huge_mm.h |  6 ++++++
> >  include/linux/rmap.h    | 24 +++++++++++++++++++++++
> >  mm/huge_memory.c        | 42 +++++++++++++++++++++--------------------
> >  mm/rmap.c               | 21 +++++++++++++++------
> >  4 files changed, 67 insertions(+), 26 deletions(-)
>
> Thanks.
>
> Acked-by: Zi Yan <ziy@nvidia.com>

Thanks for taking time to review!
Lance

>
> --
> Best Regards,
> Yan, Zi


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v8 3/3] mm/vmscan: avoid split lazyfree THP during shrink_folio_list()
  2024-06-14  1:51 ` [PATCH v8 3/3] mm/vmscan: avoid split lazyfree THP during shrink_folio_list() Lance Yang
@ 2024-06-17 18:04   ` David Hildenbrand
  2024-06-18  1:56     ` Lance Yang
  2024-06-22 10:00     ` Lance Yang
  0 siblings, 2 replies; 11+ messages in thread
From: David Hildenbrand @ 2024-06-17 18:04 UTC (permalink / raw)
  To: Lance Yang, akpm
  Cc: willy, sj, baolin.wang, maskray, ziy, ryan.roberts, 21cnbao,
	mhocko, fengwei.yin, zokeefe, shy828301, xiehuan09, libang.li,
	wangkefeng.wang, songmuchun, peterx, minchan, linux-mm,
	linux-kernel

Sorry for taking so long to review ... getting there. Mostly nits.

> @@ -497,6 +499,13 @@ static inline void split_huge_pmd_locked(struct vm_area_struct *vma,
>   					 unsigned long address, pmd_t *pmd,
>   					 bool freeze, struct folio *folio) {}
>   
> +static inline bool unmap_huge_pmd_locked(struct vm_area_struct *vma,
> +					 unsigned long addr, pmd_t *pmdp,
> +					 struct folio *folio)
> +{
> +	return false;
> +}
> +
>   #define split_huge_pud(__vma, __pmd, __address)	\
>   	do { } while (0)
>   
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index e766d3f3a302..425374ae06ed 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -2688,6 +2688,82 @@ static void unmap_folio(struct folio *folio)
>   	try_to_unmap_flush();
>   }
>   
> +static bool __discard_anon_folio_pmd_locked(struct vm_area_struct *vma,
> +					    unsigned long addr, pmd_t *pmdp,
> +					    struct folio *folio)
> +{
> +	VM_WARN_ON_FOLIO(folio_test_swapbacked(folio), folio);
> +	VM_WARN_ON_FOLIO(!folio_test_anon(folio), folio);

I would drop these (that's exactly what the single caller checks). In 
any case don't place them above the variable declaration ;)

> +
> +	struct mm_struct *mm = vma->vm_mm;
> +	int ref_count, map_count;
> +	pmd_t orig_pmd = *pmdp;
> +	struct page *page;
> +
> +	if (unlikely(!pmd_present(orig_pmd) || !pmd_trans_huge(orig_pmd)))
> +		return false;
> +
> +	page = pmd_page(orig_pmd);
> +	if (unlikely(page_folio(page) != folio))
> +		return false;

I'm curious, how could that happen? And how could it happen that we have 
!pmd_trans_huge() ? Didn't rmap walking code make sure that this PMD 
maps the folio already, and we are holding the PTL?

> +
> +	if (folio_test_dirty(folio) || pmd_dirty(orig_pmd))
> +		return false;
> +
> +	orig_pmd = pmdp_huge_clear_flush(vma, addr, pmdp);
> +
> +	/*
> +	 * Syncing against concurrent GUP-fast:
> +	 * - clear PMD; barrier; read refcount
> +	 * - inc refcount; barrier; read PMD
> +	 */
> +	smp_mb();
> +
> +	ref_count = folio_ref_count(folio);
> +	map_count = folio_mapcount(folio);
> +
> +	/*
> +	 * Order reads for folio refcount and dirty flag
> +	 * (see comments in __remove_mapping()).
> +	 */
> +	smp_rmb();
> +
> +	/*
> +	 * If the folio or its PMD is redirtied at this point, or if there
> +	 * are unexpected references, we will give up to discard this folio
> +	 * and remap it.
> +	 *
> +	 * The only folio refs must be one from isolation plus the rmap(s).
> +	 */
> +	if (folio_test_dirty(folio) || pmd_dirty(orig_pmd) ||
> +	    ref_count != map_count + 1) {
> +		set_pmd_at(mm, addr, pmdp, orig_pmd);
> +		return false;
> +	}
> +
> +	folio_remove_rmap_pmd(folio, page, vma);
> +	zap_deposited_table(mm, pmdp);
> +	add_mm_counter(mm, MM_ANONPAGES, -HPAGE_PMD_NR);
> +	if (vma->vm_flags & VM_LOCKED)
> +		mlock_drain_local();
> +	folio_put(folio);
> +
> +	return true;
> +}
> +
> +bool unmap_huge_pmd_locked(struct vm_area_struct *vma, unsigned long addr,
> +			   pmd_t *pmdp, struct folio *folio)
> +{
> +	VM_WARN_ON_FOLIO(!folio_test_pmd_mappable(folio), folio);
> +	VM_WARN_ON_FOLIO(!folio_test_locked(folio), folio);
> +	VM_WARN_ON_ONCE(!IS_ALIGNED(addr, HPAGE_PMD_SIZE));
> +
> +	if (folio_test_anon(folio) && !folio_test_swapbacked(folio))
> +		return __discard_anon_folio_pmd_locked(vma, addr, pmdp, folio);
> +
> +	return false;
> +}
> +
>   static void remap_page(struct folio *folio, unsigned long nr)
>   {
>   	int i = 0;
> diff --git a/mm/rmap.c b/mm/rmap.c
> index dacf24bc82f0..7d97806f74cd 100644
> --- a/mm/rmap.c
> +++ b/mm/rmap.c
> @@ -1678,16 +1678,23 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
>   			goto walk_abort;
>   		}
>   
> -		if (!pvmw.pte && (flags & TTU_SPLIT_HUGE_PMD)) {
> -			/*
> -			 * We temporarily have to drop the PTL and start once
> -			 * again from that now-PTE-mapped page table.
> -			 */
> -			split_huge_pmd_locked(vma, pvmw.address, pvmw.pmd,
> -					      false, folio);
> -			flags &= ~TTU_SPLIT_HUGE_PMD;
> -			page_vma_mapped_walk_restart(&pvmw);
> -			continue;
> +		if (!pvmw.pte) {
> +			if (unmap_huge_pmd_locked(vma, pvmw.address, pvmw.pmd,
> +						  folio))
> +				goto walk_done;
> +
> +			if (flags & TTU_SPLIT_HUGE_PMD) {
> +				/*
> +				 * We temporarily have to drop the PTL and start
> +				 * once again from that now-PTE-mapped page
> +				 * table.

Nit: it's not a PTE-mapped page table.

Maybe

"... restart so we can process the PTE-mapped THP."



>   		}
>   
>   		/* Unexpected PMD-mapped THP? */

Nothing else jumped at me :)

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v8 3/3] mm/vmscan: avoid split lazyfree THP during shrink_folio_list()
  2024-06-17 18:04   ` David Hildenbrand
@ 2024-06-18  1:56     ` Lance Yang
  2024-06-22 10:00     ` Lance Yang
  1 sibling, 0 replies; 11+ messages in thread
From: Lance Yang @ 2024-06-18  1:56 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: akpm, willy, sj, baolin.wang, maskray, ziy, ryan.roberts,
	21cnbao, mhocko, fengwei.yin, zokeefe, shy828301, xiehuan09,
	libang.li, wangkefeng.wang, songmuchun, peterx, minchan,
	linux-mm, linux-kernel

On Tue, Jun 18, 2024 at 2:04 AM David Hildenbrand <david@redhat.com> wrote:
>
> Sorry for taking so long to review ... getting there. Mostly nits.

No worries at all :)

Thanks for taking time to review!

>
> > @@ -497,6 +499,13 @@ static inline void split_huge_pmd_locked(struct vm_area_struct *vma,
> >                                        unsigned long address, pmd_t *pmd,
> >                                        bool freeze, struct folio *folio) {}
> >
> > +static inline bool unmap_huge_pmd_locked(struct vm_area_struct *vma,
> > +                                      unsigned long addr, pmd_t *pmdp,
> > +                                      struct folio *folio)
> > +{
> > +     return false;
> > +}
> > +
> >   #define split_huge_pud(__vma, __pmd, __address)     \
> >       do { } while (0)
> >
> > diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> > index e766d3f3a302..425374ae06ed 100644
> > --- a/mm/huge_memory.c
> > +++ b/mm/huge_memory.c
> > @@ -2688,6 +2688,82 @@ static void unmap_folio(struct folio *folio)
> >       try_to_unmap_flush();
> >   }
> >
> > +static bool __discard_anon_folio_pmd_locked(struct vm_area_struct *vma,
> > +                                         unsigned long addr, pmd_t *pmdp,
> > +                                         struct folio *folio)
> > +{
> > +     VM_WARN_ON_FOLIO(folio_test_swapbacked(folio), folio);
> > +     VM_WARN_ON_FOLIO(!folio_test_anon(folio), folio);
>
> I would drop these (that's exactly what the single caller checks). In

Agreed. I will drop these.

> any case don't place them above the variable declaration ;)

Yep, I see.

>
> > +
> > +     struct mm_struct *mm = vma->vm_mm;
> > +     int ref_count, map_count;
> > +     pmd_t orig_pmd = *pmdp;
> > +     struct page *page;
> > +
> > +     if (unlikely(!pmd_present(orig_pmd) || !pmd_trans_huge(orig_pmd)))
> > +             return false;
> > +
> > +     page = pmd_page(orig_pmd);
> > +     if (unlikely(page_folio(page) != folio))
> > +             return false;
>
> I'm curious, how could that happen? And how could it happen that we have
> !pmd_trans_huge() ? Didn't rmap walking code make sure that this PMD
> maps the folio already, and we are holding the PTL?

Makes sense to me. I was adding these just in case, but it's probably too much.

Let's drop them ;)

>
> > +
> > +     if (folio_test_dirty(folio) || pmd_dirty(orig_pmd))
> > +             return false;
> > +
> > +     orig_pmd = pmdp_huge_clear_flush(vma, addr, pmdp);
> > +
> > +     /*
> > +      * Syncing against concurrent GUP-fast:
> > +      * - clear PMD; barrier; read refcount
> > +      * - inc refcount; barrier; read PMD
> > +      */
> > +     smp_mb();
> > +
> > +     ref_count = folio_ref_count(folio);
> > +     map_count = folio_mapcount(folio);
> > +
> > +     /*
> > +      * Order reads for folio refcount and dirty flag
> > +      * (see comments in __remove_mapping()).
> > +      */
> > +     smp_rmb();
> > +
> > +     /*
> > +      * If the folio or its PMD is redirtied at this point, or if there
> > +      * are unexpected references, we will give up to discard this folio
> > +      * and remap it.
> > +      *
> > +      * The only folio refs must be one from isolation plus the rmap(s).
> > +      */
> > +     if (folio_test_dirty(folio) || pmd_dirty(orig_pmd) ||
> > +         ref_count != map_count + 1) {
> > +             set_pmd_at(mm, addr, pmdp, orig_pmd);
> > +             return false;
> > +     }
> > +
> > +     folio_remove_rmap_pmd(folio, page, vma);
> > +     zap_deposited_table(mm, pmdp);
> > +     add_mm_counter(mm, MM_ANONPAGES, -HPAGE_PMD_NR);
> > +     if (vma->vm_flags & VM_LOCKED)
> > +             mlock_drain_local();
> > +     folio_put(folio);
> > +
> > +     return true;
> > +}
> > +
> > +bool unmap_huge_pmd_locked(struct vm_area_struct *vma, unsigned long addr,
> > +                        pmd_t *pmdp, struct folio *folio)
> > +{
> > +     VM_WARN_ON_FOLIO(!folio_test_pmd_mappable(folio), folio);
> > +     VM_WARN_ON_FOLIO(!folio_test_locked(folio), folio);
> > +     VM_WARN_ON_ONCE(!IS_ALIGNED(addr, HPAGE_PMD_SIZE));
> > +
> > +     if (folio_test_anon(folio) && !folio_test_swapbacked(folio))
> > +             return __discard_anon_folio_pmd_locked(vma, addr, pmdp, folio);
> > +
> > +     return false;
> > +}
> > +
> >   static void remap_page(struct folio *folio, unsigned long nr)
> >   {
> >       int i = 0;
> > diff --git a/mm/rmap.c b/mm/rmap.c
> > index dacf24bc82f0..7d97806f74cd 100644
> > --- a/mm/rmap.c
> > +++ b/mm/rmap.c
> > @@ -1678,16 +1678,23 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
> >                       goto walk_abort;
> >               }
> >
> > -             if (!pvmw.pte && (flags & TTU_SPLIT_HUGE_PMD)) {
> > -                     /*
> > -                      * We temporarily have to drop the PTL and start once
> > -                      * again from that now-PTE-mapped page table.
> > -                      */
> > -                     split_huge_pmd_locked(vma, pvmw.address, pvmw.pmd,
> > -                                           false, folio);
> > -                     flags &= ~TTU_SPLIT_HUGE_PMD;
> > -                     page_vma_mapped_walk_restart(&pvmw);
> > -                     continue;
> > +             if (!pvmw.pte) {
> > +                     if (unmap_huge_pmd_locked(vma, pvmw.address, pvmw.pmd,
> > +                                               folio))
> > +                             goto walk_done;
> > +
> > +                     if (flags & TTU_SPLIT_HUGE_PMD) {
> > +                             /*
> > +                              * We temporarily have to drop the PTL and start
> > +                              * once again from that now-PTE-mapped page
> > +                              * table.
>
> Nit: it's not a PTE-mapped page table.
>
> Maybe
>
> "... restart so we can process the PTE-mapped THP."

Nice. Will adjust as you suggested.

>
>
>
> >               }
> >
> >               /* Unexpected PMD-mapped THP? */
>
> Nothing else jumped at me :)

Thanks again for your time!
Lance

>
> --
> Cheers,
>
> David / dhildenb
>


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v8 3/3] mm/vmscan: avoid split lazyfree THP during shrink_folio_list()
  2024-06-17 18:04   ` David Hildenbrand
  2024-06-18  1:56     ` Lance Yang
@ 2024-06-22 10:00     ` Lance Yang
  1 sibling, 0 replies; 11+ messages in thread
From: Lance Yang @ 2024-06-22 10:00 UTC (permalink / raw)
  To: david, akpm
  Cc: 21cnbao, baolin.wang, fengwei.yin, ioworker0, libang.li,
	linux-kernel, linux-mm, maskray, mhocko, minchan, peterx,
	ryan.roberts, shy828301, sj, songmuchun, wangkefeng.wang, willy,
	xiehuan09, ziy, zokeefe

Hi Andrew,

I made some minor changes suggested by David[1]. Could you please fold the
following changes into this patch?

[1] https://lore.kernel.org/linux-mm/e7c0aff1-b690-4926-9a34-4e32c9f3faaa@redhat.com/

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 4b2817bb2c7d..0cb52ae29259 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2693,21 +2693,11 @@ static bool __discard_anon_folio_pmd_locked(struct vm_area_struct *vma,
 					    unsigned long addr, pmd_t *pmdp,
 					    struct folio *folio)
 {
-	VM_WARN_ON_FOLIO(folio_test_swapbacked(folio), folio);
-	VM_WARN_ON_FOLIO(!folio_test_anon(folio), folio);
-
 	struct mm_struct *mm = vma->vm_mm;
 	int ref_count, map_count;
 	pmd_t orig_pmd = *pmdp;
 	struct page *page;
 
-	if (unlikely(!pmd_present(orig_pmd) || !pmd_trans_huge(orig_pmd)))
-		return false;
-
-	page = pmd_page(orig_pmd);
-	if (unlikely(page_folio(page) != folio))
-		return false;
-
 	if (folio_test_dirty(folio) || pmd_dirty(orig_pmd))
 		return false;
 
diff --git a/mm/rmap.c b/mm/rmap.c
index df1a43295c85..b358501fb7e8 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1678,9 +1678,8 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
 
 			if (flags & TTU_SPLIT_HUGE_PMD) {
 				/*
-				 * We temporarily have to drop the PTL and start
-				 * once again from that now-PTE-mapped page
-				 * table.
+				 * We temporarily have to drop the PTL and
+				 * restart so we can process the PTE-mapped THP.
 				 */
 				split_huge_pmd_locked(vma, pvmw.address,
 						      pvmw.pmd, false, folio);
-- 

Thanks,
Lance


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2024-06-22 10:01 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-06-14  1:51 [PATCH v8 0/3] Reclaim lazyfree THP without splitting Lance Yang
2024-06-14  1:51 ` [PATCH v8 1/3] mm/rmap: remove duplicated exit code in pagewalk loop Lance Yang
2024-06-14  1:51 ` [PATCH v8 2/3] mm/rmap: integrate PMD-mapped folio splitting into " Lance Yang
2024-06-14  7:34   ` David Hildenbrand
2024-06-14  7:46     ` Lance Yang
2024-06-14 14:26   ` Zi Yan
2024-06-14 14:41     ` Lance Yang
2024-06-14  1:51 ` [PATCH v8 3/3] mm/vmscan: avoid split lazyfree THP during shrink_folio_list() Lance Yang
2024-06-17 18:04   ` David Hildenbrand
2024-06-18  1:56     ` Lance Yang
2024-06-22 10:00     ` Lance Yang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox