linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 1/1] mm/madvise: enhance lazyfreeing with mTHP in madvise_free
@ 2024-02-25 12:32 Lance Yang
  2024-02-26  2:38 ` Yin Fengwei
                   ` (2 more replies)
  0 siblings, 3 replies; 31+ messages in thread
From: Lance Yang @ 2024-02-25 12:32 UTC (permalink / raw)
  To: akpm
  Cc: zokeefe, shy828301, david, mhocko, ryan.roberts, wangkefeng.wang,
	songmuchun, peterx, minchan, linux-mm, linux-kernel, Lance Yang

This patch improves madvise_free_pte_range() to correctly
handle large folio that is smaller than PMD-size
(for example, 16KiB to 1024KiB[1]). It’s probably part of
the preparation to support anonymous multi-size THP.

Additionally, when the consecutive PTEs are mapped to
consecutive pages of the same large folio (mTHP), if the
folio is locked before madvise(MADV_FREE) or cannot be
split, then all subsequent PTEs within the same PMD will
be skipped. However, they should have been MADV_FREEed.

Moreover, this patch also optimizes lazyfreeing with
PTE-mapped mTHP (Inspired by David Hildenbrand[2]). We
aim to avoid unnecessary folio splitting if the large
folio is entirely within the given range.

On an Intel I5 CPU, lazyfreeing a 1GiB VMA backed by
PTE-mapped folios of the same size results in the following
runtimes for madvise(MADV_FREE) in seconds (shorter is better):

Folio Size  |    Old     |    New     |  Change
----------------------------------------------
      4KiB  |  0.590251  |  0.590264  |     0%
     16KiB  |  2.990447  |  0.182167  |   -94%
     32KiB  |  2.547831  |  0.101622  |   -96%
     64KiB  |  2.457796  |  0.049726  |   -98%
    128KiB  |  2.281034  |  0.030109  |   -99%
    256KiB  |  2.230387  |  0.015838  |   -99%
    512KiB  |  2.189106  |  0.009149  |   -99%
   1024KiB  |  2.183949  |  0.006620  |   -99%
   2048KiB  |  0.002799  |  0.002795  |     0%

[1] https://lkml.kernel.org/r/20231207161211.2374093-5-ryan.roberts@arm.com
[2] https://lore.kernel.org/linux-mm/20240214204435.167852-1-david@redhat.com/

Signed-off-by: Lance Yang <ioworker0@gmail.com>
---
 mm/madvise.c | 69 +++++++++++++++++++++++++++++++++++++++++++---------
 1 file changed, 58 insertions(+), 11 deletions(-)

diff --git a/mm/madvise.c b/mm/madvise.c
index cfa5e7288261..bcbf56595a2e 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -676,11 +676,43 @@ static int madvise_free_pte_range(pmd_t *pmd, unsigned long addr,
 		 */
 		if (folio_test_large(folio)) {
 			int err;
+			unsigned long next_addr, align;
 
-			if (folio_estimated_sharers(folio) != 1)
-				break;
-			if (!folio_trylock(folio))
-				break;
+			if (folio_estimated_sharers(folio) != 1 ||
+			    !folio_trylock(folio))
+				goto skip_large_folio;
+
+			align = folio_nr_pages(folio) * PAGE_SIZE;
+			next_addr = ALIGN_DOWN(addr + align, align);
+
+			/*
+			 * If we mark only the subpages as lazyfree,
+			 * split the large folio.
+			 */
+			if (next_addr > end || next_addr - addr != align)
+				goto split_large_folio;
+
+			/*
+			 * Avoid unnecessary folio splitting if the large
+			 * folio is entirely within the given range.
+			 */
+			folio_test_clear_dirty(folio);
+			folio_unlock(folio);
+			for (; addr != next_addr; pte++, addr += PAGE_SIZE) {
+				ptent = ptep_get(pte);
+				if (pte_young(ptent) || pte_dirty(ptent)) {
+					ptent = ptep_get_and_clear_full(
+						mm, addr, pte, tlb->fullmm);
+					ptent = pte_mkold(ptent);
+					ptent = pte_mkclean(ptent);
+					set_pte_at(mm, addr, pte, ptent);
+					tlb_remove_tlb_entry(tlb, pte, addr);
+				}
+			}
+			folio_mark_lazyfree(folio);
+			goto next_folio;
+
+split_large_folio:
 			folio_get(folio);
 			arch_leave_lazy_mmu_mode();
 			pte_unmap_unlock(start_pte, ptl);
@@ -688,13 +720,28 @@ static int madvise_free_pte_range(pmd_t *pmd, unsigned long addr,
 			err = split_folio(folio);
 			folio_unlock(folio);
 			folio_put(folio);
-			if (err)
-				break;
-			start_pte = pte =
-				pte_offset_map_lock(mm, pmd, addr, &ptl);
-			if (!start_pte)
-				break;
-			arch_enter_lazy_mmu_mode();
+
+			/*
+			 * If the large folio is locked before madvise(MADV_FREE)
+			 * or cannot be split, we just skip it.
+			 */
+			if (err) {
+skip_large_folio:
+				if (next_addr >= end)
+					break;
+				pte += (next_addr - addr) / PAGE_SIZE;
+				addr = next_addr;
+			}
+
+			if (!start_pte) {
+				start_pte = pte = pte_offset_map_lock(
+					mm, pmd, addr, &ptl);
+				if (!start_pte)
+					break;
+				arch_enter_lazy_mmu_mode();
+			}
+
+next_folio:
 			pte--;
 			addr -= PAGE_SIZE;
 			continue;
-- 
2.33.1



^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2024-02-27  9:01 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-02-25 12:32 [PATCH 1/1] mm/madvise: enhance lazyfreeing with mTHP in madvise_free Lance Yang
2024-02-26  2:38 ` Yin Fengwei
2024-02-26  8:35   ` Lance Yang
2024-02-26 12:57     ` Ryan Roberts
2024-02-26 13:03       ` David Hildenbrand
2024-02-26 13:47         ` Lance Yang
2024-02-26  4:00 ` Barry Song
2024-02-26  8:37   ` Lance Yang
2024-02-26  8:41     ` David Hildenbrand
2024-02-26  8:55       ` Lance Yang
2024-02-26 13:04         ` Ryan Roberts
2024-02-26 13:50           ` Lance Yang
2024-02-27  1:21             ` Barry Song
2024-02-27  1:48               ` Lance Yang
2024-02-27  2:12                 ` Barry Song
2024-02-27  2:15                   ` Lance Yang
2024-02-26 20:49           ` Barry Song
2024-02-27  1:51             ` Yin Fengwei
2024-02-27  2:17               ` Barry Song
2024-02-27  6:14                 ` Yin Fengwei
2024-02-27  6:40                   ` Barry Song
2024-02-27  6:42                     ` Barry Song
2024-02-27  7:02                     ` Yin Fengwei
2024-02-27  7:11                       ` Barry Song
2024-02-27  7:21                         ` Barry Song
2024-02-27  7:42                           ` Yin Fengwei
2024-02-27  7:54                             ` Barry Song
2024-02-27  8:33                               ` Yin Fengwei
2024-02-27  9:01                                 ` Barry Song
2024-02-26 13:00 ` Ryan Roberts
2024-02-26 13:54   ` Lance Yang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox