[PATCH 0/3] mm: batched unmap lazyfree large folios during reclamation

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 0/3] mm: batched unmap lazyfree large folios during reclamation
@ 2025-01-06  3:17 Barry Song
  2025-01-06  3:17 ` [PATCH 1/3] mm: set folio swapbacked iff folios are dirty in try_to_unmap_one Barry Song
                   ` (3 more replies)
  0 siblings, 4 replies; 19+ messages in thread
From: Barry Song @ 2025-01-06  3:17 UTC (permalink / raw)
  To: akpm, linux-mm
  Cc: linux-arm-kernel, x86, linux-kernel, ioworker0, david,
	ryan.roberts, zhengtangquan, ying.huang, kasong, chrisl,
	baolin.wang, Barry Song

From: Barry Song <v-songbaohua@oppo.com>

Commit 735ecdfaf4e80 ("mm/vmscan: avoid splitting lazyfree THP during 
shrink_folio_list()") prevents the splitting of MADV_FREE'd THP in madvise.c. 
However, those folios are still added to the deferred_split list in 
try_to_unmap_one() because we are unmapping PTEs and removing rmap entries 
one by one. This approach is not only slow but also increases the risk of a 
race condition where lazyfree folios are incorrectly set back to swapbacked, 
as a speculative folio_get may occur in the shrinker's callback.

This patchset addresses the issue by only marking truly dirty folios as 
swapbacked as suggested by David and shifting to batched unmapping of the
entire folio in  try_to_unmap_one(). As a result, we've observed
deferred_split dropping to  zero and significant performance improvements
in memory reclamation.

Barry Song (3):
  mm: set folio swapbacked iff folios are dirty in try_to_unmap_one
  mm: Support tlbbatch flush for a range of PTEs
  mm: Support batched unmap for lazyfree large folios during reclamation

 arch/arm64/include/asm/tlbflush.h |  26 ++++----
 arch/arm64/mm/contpte.c           |   2 +-
 arch/x86/include/asm/tlbflush.h   |   3 +-
 mm/rmap.c                         | 103 ++++++++++++++++++++----------
 4 files changed, 85 insertions(+), 49 deletions(-)

-- 
2.39.3 (Apple Git-146)

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 1/3] mm: set folio swapbacked iff folios are dirty in try_to_unmap_one
  2025-01-06  3:17 [PATCH 0/3] mm: batched unmap lazyfree large folios during reclamation Barry Song
@ 2025-01-06  3:17 ` Barry Song
  2025-01-06  6:40   ` Baolin Wang
  2025-01-06  3:17 ` [PATCH 2/3] mm: Support tlbbatch flush for a range of PTEs Barry Song
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 19+ messages in thread
From: Barry Song @ 2025-01-06  3:17 UTC (permalink / raw)
  To: akpm, linux-mm
  Cc: linux-arm-kernel, x86, linux-kernel, ioworker0, david,
	ryan.roberts, zhengtangquan, ying.huang, kasong, chrisl,
	baolin.wang, Barry Song

From: Barry Song <v-songbaohua@oppo.com>

The refcount may be temporarily or long-term increased, but this does
not change the fundamental nature of the folio already being lazy-
freed. Therefore, we only reset 'swapbacked' when we are certain the
folio is dirty and not droppable.

Suggested-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Barry Song <v-songbaohua@oppo.com>
---
 mm/rmap.c | 49 ++++++++++++++++++++++---------------------------
 1 file changed, 22 insertions(+), 27 deletions(-)

diff --git a/mm/rmap.c b/mm/rmap.c
index c6c4d4ea29a7..de6b8c34e98c 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1868,34 +1868,29 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
 				 */
 				smp_rmb();
 
-				/*
-				 * The only page refs must be one from isolation
-				 * plus the rmap(s) (dropped by discard:).
-				 */
-				if (ref_count == 1 + map_count &&
-				    (!folio_test_dirty(folio) ||
-				     /*
-				      * Unlike MADV_FREE mappings, VM_DROPPABLE
-				      * ones can be dropped even if they've
-				      * been dirtied.
-				      */
-				     (vma->vm_flags & VM_DROPPABLE))) {
-					dec_mm_counter(mm, MM_ANONPAGES);
-					goto discard;
-				}
-
-				/*
-				 * If the folio was redirtied, it cannot be
-				 * discarded. Remap the page to page table.
-				 */
-				set_pte_at(mm, address, pvmw.pte, pteval);
-				/*
-				 * Unlike MADV_FREE mappings, VM_DROPPABLE ones
-				 * never get swap backed on failure to drop.
-				 */
-				if (!(vma->vm_flags & VM_DROPPABLE))
+				if (folio_test_dirty(folio) && !(vma->vm_flags & VM_DROPPABLE)) {
+					/*
+					 * redirtied either using the page table or a previously
+					 * obtained GUP reference.
+					 */
+					set_pte_at(mm, address, pvmw.pte, pteval);
 					folio_set_swapbacked(folio);
-				goto walk_abort;
+					goto walk_abort;
+				} else if (ref_count != 1 + map_count) {
+					/*
+					 * Additional reference. Could be a GUP reference or any
+					 * speculative reference. GUP users must mark the folio
+					 * dirty if there was a modification. This folio cannot be
+					 * reclaimed right now either way, so act just like nothing
+					 * happened.
+					 * We'll come back here later and detect if the folio was
+					 * dirtied when the additional reference is gone.
+					 */
+					set_pte_at(mm, address, pvmw.pte, pteval);
+					goto walk_abort;
+				}
+				dec_mm_counter(mm, MM_ANONPAGES);
+				goto discard;
 			}
 
 			if (swap_duplicate(entry) < 0) {
-- 
2.39.3 (Apple Git-146)



^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 2/3] mm: Support tlbbatch flush for a range of PTEs
  2025-01-06  3:17 [PATCH 0/3] mm: batched unmap lazyfree large folios during reclamation Barry Song
  2025-01-06  3:17 ` [PATCH 1/3] mm: set folio swapbacked iff folios are dirty in try_to_unmap_one Barry Song
@ 2025-01-06  3:17 ` Barry Song
  2025-01-06  8:22   ` kernel test robot
  2025-01-06 10:07   ` kernel test robot
  2025-01-06  3:17 ` [PATCH 3/3] mm: Support batched unmap for lazyfree large folios during reclamation Barry Song
  2025-01-06 17:28 ` [PATCH 0/3] mm: batched unmap " Lorenzo Stoakes
  3 siblings, 2 replies; 19+ messages in thread
From: Barry Song @ 2025-01-06  3:17 UTC (permalink / raw)
  To: akpm, linux-mm
  Cc: linux-arm-kernel, x86, linux-kernel, ioworker0, david,
	ryan.roberts, zhengtangquan, ying.huang, kasong, chrisl,
	baolin.wang, Barry Song, Catalin Marinas, Will Deacon,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
	H. Peter Anvin, Anshuman Khandual, Shaoqin Huang, Gavin Shan,
	Kefeng Wang, Mark Rutland, Kirill A. Shutemov, Yosry Ahmed

From: Barry Song <v-songbaohua@oppo.com>

This is a preparatory patch to support batch PTE unmapping in
`try_to_unmap_one`. It first introduces range handling for
`tlbbatch` flush. Currently, the range is always set to the size of
PAGE_SIZE.

Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Shaoqin Huang <shahuang@redhat.com>
Cc: Gavin Shan <gshan@redhat.com>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Lance Yang <ioworker0@gmail.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Yosry Ahmed <yosryahmed@google.com>
Signed-off-by: Barry Song <v-songbaohua@oppo.com>
---
 arch/arm64/include/asm/tlbflush.h | 26 ++++++++++++++------------
 arch/arm64/mm/contpte.c           |  2 +-
 arch/x86/include/asm/tlbflush.h   |  3 ++-
 mm/rmap.c                         | 12 +++++++-----
 4 files changed, 24 insertions(+), 19 deletions(-)

diff --git a/arch/arm64/include/asm/tlbflush.h b/arch/arm64/include/asm/tlbflush.h
index bc94e036a26b..f34e4fab5aa2 100644
--- a/arch/arm64/include/asm/tlbflush.h
+++ b/arch/arm64/include/asm/tlbflush.h
@@ -322,13 +322,6 @@ static inline bool arch_tlbbatch_should_defer(struct mm_struct *mm)
 	return true;
 }
 
-static inline void arch_tlbbatch_add_pending(struct arch_tlbflush_unmap_batch *batch,
-					     struct mm_struct *mm,
-					     unsigned long uaddr)
-{
-	__flush_tlb_page_nosync(mm, uaddr);
-}
-
 /*
  * If mprotect/munmap/etc occurs during TLB batched flushing, we need to
  * synchronise all the TLBI issued with a DSB to avoid the race mentioned in
@@ -448,7 +441,7 @@ static inline bool __flush_tlb_range_limit_excess(unsigned long start,
 	return false;
 }
 
-static inline void __flush_tlb_range_nosync(struct vm_area_struct *vma,
+static inline void __flush_tlb_range_nosync(struct mm_struct *mm,
 				     unsigned long start, unsigned long end,
 				     unsigned long stride, bool last_level,
 				     int tlb_level)
@@ -460,12 +453,12 @@ static inline void __flush_tlb_range_nosync(struct vm_area_struct *vma,
 	pages = (end - start) >> PAGE_SHIFT;
 
 	if (__flush_tlb_range_limit_excess(start, end, pages, stride)) {
-		flush_tlb_mm(vma->vm_mm);
+		flush_tlb_mm(mm);
 		return;
 	}
 
 	dsb(ishst);
-	asid = ASID(vma->vm_mm);
+	asid = ASID(mm);
 
 	if (last_level)
 		__flush_tlb_range_op(vale1is, start, pages, stride, asid,
@@ -474,7 +467,7 @@ static inline void __flush_tlb_range_nosync(struct vm_area_struct *vma,
 		__flush_tlb_range_op(vae1is, start, pages, stride, asid,
 				     tlb_level, true, lpa2_is_enabled());
 
-	mmu_notifier_arch_invalidate_secondary_tlbs(vma->vm_mm, start, end);
+	mmu_notifier_arch_invalidate_secondary_tlbs(mm, start, end);
 }
 
 static inline void __flush_tlb_range(struct vm_area_struct *vma,
@@ -482,7 +475,7 @@ static inline void __flush_tlb_range(struct vm_area_struct *vma,
 				     unsigned long stride, bool last_level,
 				     int tlb_level)
 {
-	__flush_tlb_range_nosync(vma, start, end, stride,
+	__flush_tlb_range_nosync(vma->vm_mm, start, end, stride,
 				 last_level, tlb_level);
 	dsb(ish);
 }
@@ -533,6 +526,15 @@ static inline void __flush_tlb_kernel_pgtable(unsigned long kaddr)
 	dsb(ish);
 	isb();
 }
+
+static inline void arch_tlbbatch_add_pending(struct arch_tlbflush_unmap_batch *batch,
+					     struct mm_struct *mm,
+					     unsigned long uaddr,
+					     unsigned long size)
+{
+	__flush_tlb_range_nosync(mm, uaddr, uaddr + size,
+				 PAGE_SIZE, true, 3);
+}
 #endif
 
 #endif
diff --git a/arch/arm64/mm/contpte.c b/arch/arm64/mm/contpte.c
index 55107d27d3f8..bcac4f55f9c1 100644
--- a/arch/arm64/mm/contpte.c
+++ b/arch/arm64/mm/contpte.c
@@ -335,7 +335,7 @@ int contpte_ptep_clear_flush_young(struct vm_area_struct *vma,
 		 * eliding the trailing DSB applies here.
 		 */
 		addr = ALIGN_DOWN(addr, CONT_PTE_SIZE);
-		__flush_tlb_range_nosync(vma, addr, addr + CONT_PTE_SIZE,
+		__flush_tlb_range_nosync(vma->vm_mm, addr, addr + CONT_PTE_SIZE,
 					 PAGE_SIZE, true, 3);
 	}
 
diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h
index 69e79fff41b8..cda35f53f544 100644
--- a/arch/x86/include/asm/tlbflush.h
+++ b/arch/x86/include/asm/tlbflush.h
@@ -279,7 +279,8 @@ static inline u64 inc_mm_tlb_gen(struct mm_struct *mm)
 
 static inline void arch_tlbbatch_add_pending(struct arch_tlbflush_unmap_batch *batch,
 					     struct mm_struct *mm,
-					     unsigned long uaddr)
+					     unsigned long uaddr,
+					     unsignd long size)
 {
 	inc_mm_tlb_gen(mm);
 	cpumask_or(&batch->cpumask, &batch->cpumask, mm_cpumask(mm));
diff --git a/mm/rmap.c b/mm/rmap.c
index de6b8c34e98c..365112af5291 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -672,7 +672,8 @@ void try_to_unmap_flush_dirty(void)
 	(TLB_FLUSH_BATCH_PENDING_MASK / 2)
 
 static void set_tlb_ubc_flush_pending(struct mm_struct *mm, pte_t pteval,
-				      unsigned long uaddr)
+				      unsigned long uaddr,
+				      unsigned long size)
 {
 	struct tlbflush_unmap_batch *tlb_ubc = &current->tlb_ubc;
 	int batch;
@@ -681,7 +682,7 @@ static void set_tlb_ubc_flush_pending(struct mm_struct *mm, pte_t pteval,
 	if (!pte_accessible(mm, pteval))
 		return;
 
-	arch_tlbbatch_add_pending(&tlb_ubc->arch, mm, uaddr);
+	arch_tlbbatch_add_pending(&tlb_ubc->arch, mm, uaddr, size);
 	tlb_ubc->flush_required = true;
 
 	/*
@@ -757,7 +758,8 @@ void flush_tlb_batched_pending(struct mm_struct *mm)
 }
 #else
 static void set_tlb_ubc_flush_pending(struct mm_struct *mm, pte_t pteval,
-				      unsigned long uaddr)
+				      unsigned long uaddr,
+				      unsigned long size)
 {
 }
 
@@ -1792,7 +1794,7 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
 				 */
 				pteval = ptep_get_and_clear(mm, address, pvmw.pte);
 
-				set_tlb_ubc_flush_pending(mm, pteval, address);
+				set_tlb_ubc_flush_pending(mm, pteval, address, PAGE_SIZE);
 			} else {
 				pteval = ptep_clear_flush(vma, address, pvmw.pte);
 			}
@@ -2164,7 +2166,7 @@ static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma,
 				 */
 				pteval = ptep_get_and_clear(mm, address, pvmw.pte);
 
-				set_tlb_ubc_flush_pending(mm, pteval, address);
+				set_tlb_ubc_flush_pending(mm, pteval, address, PAGE_SIZE);
 			} else {
 				pteval = ptep_clear_flush(vma, address, pvmw.pte);
 			}
-- 
2.39.3 (Apple Git-146)



^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 3/3] mm: Support batched unmap for lazyfree large folios during reclamation
  2025-01-06  3:17 [PATCH 0/3] mm: batched unmap lazyfree large folios during reclamation Barry Song
  2025-01-06  3:17 ` [PATCH 1/3] mm: set folio swapbacked iff folios are dirty in try_to_unmap_one Barry Song
  2025-01-06  3:17 ` [PATCH 2/3] mm: Support tlbbatch flush for a range of PTEs Barry Song
@ 2025-01-06  3:17 ` Barry Song
  2025-01-06 17:28 ` [PATCH 0/3] mm: batched unmap " Lorenzo Stoakes
  3 siblings, 0 replies; 19+ messages in thread
From: Barry Song @ 2025-01-06  3:17 UTC (permalink / raw)
  To: akpm, linux-mm
  Cc: linux-arm-kernel, x86, linux-kernel, ioworker0, david,
	ryan.roberts, zhengtangquan, ying.huang, kasong, chrisl,
	baolin.wang, Barry Song

From: Barry Song <v-songbaohua@oppo.com>

Currently, the PTEs and rmap of a large folio are removed one at a time.
This is not only slow but also causes the large folio to be unnecessarily
added to deferred_split, which can lead to races between the
deferred_split shrinker callback and memory reclamation. This patch
releases all PTEs and rmap entries in a batch.
Currently, it only handles lazyfree large folios.

The below microbench tries to reclaim 128MB lazyfree large folios
whose sizes are 64KiB:

 #include <stdio.h>
 #include <sys/mman.h>
 #include <string.h>
 #include <time.h>

 #define SIZE 128*1024*1024  // 128 MB

 unsigned long read_split_deferred()
 {
 	FILE *file = fopen("/sys/kernel/mm/transparent_hugepage"
			"/hugepages-64kB/stats/split_deferred", "r");
 	if (!file) {
 		perror("Error opening file");
 		return 0;
 	}

 	unsigned long value;
 	if (fscanf(file, "%lu", &value) != 1) {
 		perror("Error reading value");
 		fclose(file);
 		return 0;
 	}

 	fclose(file);
 	return value;
 }

 int main(int argc, char *argv[])
 {
 	while(1) {
 		volatile int *p = mmap(0, SIZE, PROT_READ | PROT_WRITE,
 				MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);

 		memset((void *)p, 1, SIZE);

 		madvise((void *)p, SIZE, MADV_FREE);

 		clock_t start_time = clock();
 		unsigned long start_split = read_split_deferred();
 		madvise((void *)p, SIZE, MADV_PAGEOUT);
 		clock_t end_time = clock();
 		unsigned long end_split = read_split_deferred();

 		double elapsed_time = (double)(end_time - start_time) / CLOCKS_PER_SEC;
 		printf("Time taken by reclamation: %f seconds, split_deferred: %ld\n",
 			elapsed_time, end_split - start_split);

 		munmap((void *)p, SIZE);
 	}
 	return 0;
 }

w/o patch:
~ # ./a.out
Time taken by reclamation: 0.177418 seconds, split_deferred: 2048
Time taken by reclamation: 0.178348 seconds, split_deferred: 2048
Time taken by reclamation: 0.174525 seconds, split_deferred: 2048
Time taken by reclamation: 0.171620 seconds, split_deferred: 2048
Time taken by reclamation: 0.172241 seconds, split_deferred: 2048
Time taken by reclamation: 0.174003 seconds, split_deferred: 2048
Time taken by reclamation: 0.171058 seconds, split_deferred: 2048
Time taken by reclamation: 0.171993 seconds, split_deferred: 2048
Time taken by reclamation: 0.169829 seconds, split_deferred: 2048
Time taken by reclamation: 0.172895 seconds, split_deferred: 2048
Time taken by reclamation: 0.176063 seconds, split_deferred: 2048
Time taken by reclamation: 0.172568 seconds, split_deferred: 2048
Time taken by reclamation: 0.171185 seconds, split_deferred: 2048
Time taken by reclamation: 0.170632 seconds, split_deferred: 2048
Time taken by reclamation: 0.170208 seconds, split_deferred: 2048
Time taken by reclamation: 0.174192 seconds, split_deferred: 2048
...

w/ patch:
~ # ./a.out
Time taken by reclamation: 0.074231 seconds, split_deferred: 0
Time taken by reclamation: 0.071026 seconds, split_deferred: 0
Time taken by reclamation: 0.072029 seconds, split_deferred: 0
Time taken by reclamation: 0.071873 seconds, split_deferred: 0
Time taken by reclamation: 0.073573 seconds, split_deferred: 0
Time taken by reclamation: 0.071906 seconds, split_deferred: 0
Time taken by reclamation: 0.073604 seconds, split_deferred: 0
Time taken by reclamation: 0.075903 seconds, split_deferred: 0
Time taken by reclamation: 0.073191 seconds, split_deferred: 0
Time taken by reclamation: 0.071228 seconds, split_deferred: 0
Time taken by reclamation: 0.071391 seconds, split_deferred: 0
Time taken by reclamation: 0.071468 seconds, split_deferred: 0
Time taken by reclamation: 0.071896 seconds, split_deferred: 0
Time taken by reclamation: 0.072508 seconds, split_deferred: 0
Time taken by reclamation: 0.071884 seconds, split_deferred: 0
Time taken by reclamation: 0.072433 seconds, split_deferred: 0
Time taken by reclamation: 0.071939 seconds, split_deferred: 0
...

Signed-off-by: Barry Song <v-songbaohua@oppo.com>
---
 mm/rmap.c | 48 ++++++++++++++++++++++++++++++++++++++++++------
 1 file changed, 42 insertions(+), 6 deletions(-)

diff --git a/mm/rmap.c b/mm/rmap.c
index 365112af5291..9424b96f8482 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1642,6 +1642,27 @@ void folio_remove_rmap_pmd(struct folio *folio, struct page *page,
 #endif
 }

+/* We support batch unmapping of PTEs for lazyfree large folios */
+static inline bool can_batch_unmap_folio_ptes(unsigned long addr,
+			struct folio *folio, pte_t *ptep)
+{
+	const fpb_t fpb_flags = FPB_IGNORE_DIRTY | FPB_IGNORE_SOFT_DIRTY;
+	int max_nr = folio_nr_pages(folio);
+	pte_t pte = ptep_get(ptep);
+
+	if (pte_none(pte))
+		return false;
+	if (!pte_present(pte))
+		return false;
+	if (!folio_test_anon(folio))
+		return false;
+	if (folio_test_swapbacked(folio))
+		return false;
+
+	return folio_pte_batch(folio, addr, ptep, pte, max_nr, fpb_flags, NULL,
+			       NULL, NULL) == max_nr;
+}
+
 /*
  * @arg: enum ttu_flags will be passed to this argument
  */
@@ -1655,6 +1676,7 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
 	bool anon_exclusive, ret = true;
 	struct mmu_notifier_range range;
 	enum ttu_flags flags = (enum ttu_flags)(long)arg;
+	int nr_pages = 1;
 	unsigned long pfn;
 	unsigned long hsz = 0;

@@ -1780,6 +1802,15 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
 				hugetlb_vma_unlock_write(vma);
 			}
 			pteval = huge_ptep_clear_flush(vma, address, pvmw.pte);
+		} else if (folio_test_large(folio) &&
+				can_batch_unmap_folio_ptes(address, folio, pvmw.pte)) {
+			nr_pages = folio_nr_pages(folio);
+			flush_cache_range(vma, range.start, range.end);
+			pteval = get_and_clear_full_ptes(mm, address, pvmw.pte, nr_pages, 0);
+			if (should_defer_flush(mm, flags))
+				set_tlb_ubc_flush_pending(mm, pteval, address, folio_size(folio));
+			else
+				flush_tlb_range(vma, range.start, range.end);
 		} else {
 			flush_cache_page(vma, address, pfn);
 			/* Nuke the page table entry. */
@@ -1875,7 +1906,7 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
 					 * redirtied either using the page table or a previously
 					 * obtained GUP reference.
 					 */
-					set_pte_at(mm, address, pvmw.pte, pteval);
+					set_ptes(mm, address, pvmw.pte, pteval, nr_pages);
 					folio_set_swapbacked(folio);
 					goto walk_abort;
 				} else if (ref_count != 1 + map_count) {
@@ -1888,10 +1919,10 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
 					 * We'll come back here later and detect if the folio was
 					 * dirtied when the additional reference is gone.
 					 */
-					set_pte_at(mm, address, pvmw.pte, pteval);
+					set_ptes(mm, address, pvmw.pte, pteval, nr_pages);
 					goto walk_abort;
 				}
-				dec_mm_counter(mm, MM_ANONPAGES);
+				add_mm_counter(mm, MM_ANONPAGES, -nr_pages);
 				goto discard;
 			}

@@ -1943,13 +1974,18 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
 			dec_mm_counter(mm, mm_counter_file(folio));
 		}
 discard:
-		if (unlikely(folio_test_hugetlb(folio)))
+		if (unlikely(folio_test_hugetlb(folio))) {
 			hugetlb_remove_rmap(folio);
-		else
-			folio_remove_rmap_pte(folio, subpage, vma);
+		} else {
+			folio_remove_rmap_ptes(folio, subpage, nr_pages, vma);
+			folio_ref_sub(folio, nr_pages - 1);
+		}
 		if (vma->vm_flags & VM_LOCKED)
 			mlock_drain_local();
 		folio_put(folio);
+		/* We have already batched the entire folio */
+		if (nr_pages > 1)
+			goto walk_done;
 		continue;
 walk_abort:
 		ret = false;
-- 
2.39.3 (Apple Git-146)

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/3] mm: set folio swapbacked iff folios are dirty in try_to_unmap_one
  2025-01-06  3:17 ` [PATCH 1/3] mm: set folio swapbacked iff folios are dirty in try_to_unmap_one Barry Song
@ 2025-01-06  6:40   ` Baolin Wang
  2025-01-06  9:03     ` Barry Song
  0 siblings, 1 reply; 19+ messages in thread
From: Baolin Wang @ 2025-01-06  6:40 UTC (permalink / raw)
  To: Barry Song, akpm, linux-mm
  Cc: linux-arm-kernel, x86, linux-kernel, ioworker0, david,
	ryan.roberts, zhengtangquan, ying.huang, kasong, chrisl,
	Barry Song



On 2025/1/6 11:17, Barry Song wrote:
> From: Barry Song <v-songbaohua@oppo.com>
> 
> The refcount may be temporarily or long-term increased, but this does
> not change the fundamental nature of the folio already being lazy-
> freed. Therefore, we only reset 'swapbacked' when we are certain the
> folio is dirty and not droppable.
> 
> Suggested-by: David Hildenbrand <david@redhat.com>
> Signed-off-by: Barry Song <v-songbaohua@oppo.com>

The changes look good to me. While we are at it, could you also change 
the __discard_anon_folio_pmd_locked() to follow the same strategy for 
lazy-freed PMD-sized folio?


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/3] mm: Support tlbbatch flush for a range of PTEs
  2025-01-06  3:17 ` [PATCH 2/3] mm: Support tlbbatch flush for a range of PTEs Barry Song
@ 2025-01-06  8:22   ` kernel test robot
  2025-01-13  0:55     ` Barry Song
  2025-01-06 10:07   ` kernel test robot
  1 sibling, 1 reply; 19+ messages in thread
From: kernel test robot @ 2025-01-06  8:22 UTC (permalink / raw)
  To: Barry Song, akpm, linux-mm
  Cc: oe-kbuild-all, linux-arm-kernel, x86, linux-kernel, ioworker0,
	david, ryan.roberts, zhengtangquan, ying.huang, kasong, chrisl,
	baolin.wang, Barry Song, Catalin Marinas, Will Deacon,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
	H. Peter Anvin, Anshuman Khandual, Shaoqin Huang, Gavin Shan,
	Kefeng Wang, Mark Rutland, Kirill A. Shutemov, Yosry Ahmed

Hi Barry,

kernel test robot noticed the following build errors:

[auto build test ERROR on akpm-mm/mm-everything]

url:    https://github.com/intel-lab-lkp/linux/commits/Barry-Song/mm-set-folio-swapbacked-iff-folios-are-dirty-in-try_to_unmap_one/20250106-112638
base:   https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-everything
patch link:    https://lore.kernel.org/r/20250106031711.82855-3-21cnbao%40gmail.com
patch subject: [PATCH 2/3] mm: Support tlbbatch flush for a range of PTEs
config: i386-buildonly-randconfig-002-20250106 (https://download.01.org/0day-ci/archive/20250106/202501061535.zx9E486H-lkp@intel.com/config)
compiler: gcc-12 (Debian 12.2.0-14) 12.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250106/202501061535.zx9E486H-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202501061535.zx9E486H-lkp@intel.com/

All errors (new ones prefixed by >>):

   In file included from arch/x86/include/asm/uaccess.h:17,
                    from include/linux/uaccess.h:12,
                    from include/linux/sched/task.h:13,
                    from include/linux/sched/signal.h:9,
                    from include/linux/rcuwait.h:6,
                    from include/linux/percpu-rwsem.h:7,
                    from include/linux/fs.h:33,
                    from include/linux/cgroup.h:17,
                    from include/linux/memcontrol.h:13,
                    from include/linux/swap.h:9,
                    from include/linux/suspend.h:5,
                    from arch/x86/kernel/asm-offsets.c:14:
>> arch/x86/include/asm/tlbflush.h:283:46: error: unknown type name 'unsignd'; did you mean 'unsigned'?
     283 |                                              unsignd long size)
         |                                              ^~~~~~~
         |                                              unsigned
   make[3]: *** [scripts/Makefile.build:102: arch/x86/kernel/asm-offsets.s] Error 1 shuffle=998720002
   make[3]: Target 'prepare' not remade because of errors.
   make[2]: *** [Makefile:1263: prepare0] Error 2 shuffle=998720002
   make[2]: Target 'prepare' not remade because of errors.
   make[1]: *** [Makefile:251: __sub-make] Error 2 shuffle=998720002
   make[1]: Target 'prepare' not remade because of errors.
   make: *** [Makefile:251: __sub-make] Error 2 shuffle=998720002
   make: Target 'prepare' not remade because of errors.


vim +283 arch/x86/include/asm/tlbflush.h

   279	
   280	static inline void arch_tlbbatch_add_pending(struct arch_tlbflush_unmap_batch *batch,
   281						     struct mm_struct *mm,
   282						     unsigned long uaddr,
 > 283						     unsignd long size)
   284	{
   285		inc_mm_tlb_gen(mm);
   286		cpumask_or(&batch->cpumask, &batch->cpumask, mm_cpumask(mm));
   287		mmu_notifier_arch_invalidate_secondary_tlbs(mm, 0, -1UL);
   288	}
   289	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/3] mm: set folio swapbacked iff folios are dirty in try_to_unmap_one
  2025-01-06  6:40   ` Baolin Wang
@ 2025-01-06  9:03     ` Barry Song
  2025-01-06  9:34       ` Baolin Wang
  0 siblings, 1 reply; 19+ messages in thread
From: Barry Song @ 2025-01-06  9:03 UTC (permalink / raw)
  To: Baolin Wang
  Cc: akpm, linux-mm, linux-arm-kernel, x86, linux-kernel, ioworker0,
	david, ryan.roberts, zhengtangquan, ying.huang, kasong, chrisl,
	Barry Song

On Mon, Jan 6, 2025 at 7:40 PM Baolin Wang
<baolin.wang@linux.alibaba.com> wrote:
>
>
>
> On 2025/1/6 11:17, Barry Song wrote:
> > From: Barry Song <v-songbaohua@oppo.com>
> >
> > The refcount may be temporarily or long-term increased, but this does
> > not change the fundamental nature of the folio already being lazy-
> > freed. Therefore, we only reset 'swapbacked' when we are certain the
> > folio is dirty and not droppable.
> >
> > Suggested-by: David Hildenbrand <david@redhat.com>
> > Signed-off-by: Barry Song <v-songbaohua@oppo.com>
>
> The changes look good to me. While we are at it, could you also change
> the __discard_anon_folio_pmd_locked() to follow the same strategy for
> lazy-freed PMD-sized folio?

it seems you mean __discard_anon_folio_pmd_locked() is lacking
folio_set_swapbacked(folio) for dirty pmd-mapped folios?
and it seems !(vma->vm_flags & VM_DROPPABLE) is also not
handled properly?

Thanks
barry


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/3] mm: set folio swapbacked iff folios are dirty in try_to_unmap_one
  2025-01-06  9:03     ` Barry Song
@ 2025-01-06  9:34       ` Baolin Wang
  2025-01-06 14:39         ` Lance Yang
  0 siblings, 1 reply; 19+ messages in thread
From: Baolin Wang @ 2025-01-06  9:34 UTC (permalink / raw)
  To: Barry Song
  Cc: akpm, linux-mm, linux-arm-kernel, x86, linux-kernel, ioworker0,
	david, ryan.roberts, zhengtangquan, ying.huang, kasong, chrisl,
	Barry Song



On 2025/1/6 17:03, Barry Song wrote:
> On Mon, Jan 6, 2025 at 7:40 PM Baolin Wang
> <baolin.wang@linux.alibaba.com> wrote:
>>
>>
>>
>> On 2025/1/6 11:17, Barry Song wrote:
>>> From: Barry Song <v-songbaohua@oppo.com>
>>>
>>> The refcount may be temporarily or long-term increased, but this does
>>> not change the fundamental nature of the folio already being lazy-
>>> freed. Therefore, we only reset 'swapbacked' when we are certain the
>>> folio is dirty and not droppable.
>>>
>>> Suggested-by: David Hildenbrand <david@redhat.com>
>>> Signed-off-by: Barry Song <v-songbaohua@oppo.com>
>>
>> The changes look good to me. While we are at it, could you also change
>> the __discard_anon_folio_pmd_locked() to follow the same strategy for
>> lazy-freed PMD-sized folio?
> 
> it seems you mean __discard_anon_folio_pmd_locked() is lacking
> folio_set_swapbacked(folio) for dirty pmd-mapped folios?
> and it seems !(vma->vm_flags & VM_DROPPABLE) is also not
> handled properly?

Right.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/3] mm: Support tlbbatch flush for a range of PTEs
  2025-01-06  3:17 ` [PATCH 2/3] mm: Support tlbbatch flush for a range of PTEs Barry Song
  2025-01-06  8:22   ` kernel test robot
@ 2025-01-06 10:07   ` kernel test robot
  2025-01-13  0:56     ` Barry Song
  1 sibling, 1 reply; 19+ messages in thread
From: kernel test robot @ 2025-01-06 10:07 UTC (permalink / raw)
  To: Barry Song, akpm, linux-mm
  Cc: oe-kbuild-all, linux-arm-kernel, x86, linux-kernel, ioworker0,
	david, ryan.roberts, zhengtangquan, ying.huang, kasong, chrisl,
	baolin.wang, Barry Song, Catalin Marinas, Will Deacon,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
	H. Peter Anvin, Anshuman Khandual, Shaoqin Huang, Gavin Shan,
	Kefeng Wang, Mark Rutland, Kirill A. Shutemov, Yosry Ahmed

Hi Barry,

kernel test robot noticed the following build errors:

[auto build test ERROR on akpm-mm/mm-everything]

url:    https://github.com/intel-lab-lkp/linux/commits/Barry-Song/mm-set-folio-swapbacked-iff-folios-are-dirty-in-try_to_unmap_one/20250106-112638
base:   https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-everything
patch link:    https://lore.kernel.org/r/20250106031711.82855-3-21cnbao%40gmail.com
patch subject: [PATCH 2/3] mm: Support tlbbatch flush for a range of PTEs
config: riscv-randconfig-001-20250106 (https://download.01.org/0day-ci/archive/20250106/202501061736.FoHcInHJ-lkp@intel.com/config)
compiler: riscv64-linux-gcc (GCC) 14.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250106/202501061736.FoHcInHJ-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202501061736.FoHcInHJ-lkp@intel.com/

All errors (new ones prefixed by >>):

   mm/rmap.c: In function 'set_tlb_ubc_flush_pending':
>> mm/rmap.c:685:9: error: too many arguments to function 'arch_tlbbatch_add_pending'
     685 |         arch_tlbbatch_add_pending(&tlb_ubc->arch, mm, uaddr, size);
         |         ^~~~~~~~~~~~~~~~~~~~~~~~~
   In file included from arch/riscv/include/asm/pgtable.h:113,
                    from include/linux/pgtable.h:6,
                    from include/linux/mm.h:30,
                    from mm/rmap.c:55:
   arch/riscv/include/asm/tlbflush.h:62:6: note: declared here
      62 | void arch_tlbbatch_add_pending(struct arch_tlbflush_unmap_batch *batch,
         |      ^~~~~~~~~~~~~~~~~~~~~~~~~


vim +/arch_tlbbatch_add_pending +685 mm/rmap.c

   663	
   664	/*
   665	 * Bits 0-14 of mm->tlb_flush_batched record pending generations.
   666	 * Bits 16-30 of mm->tlb_flush_batched bit record flushed generations.
   667	 */
   668	#define TLB_FLUSH_BATCH_FLUSHED_SHIFT	16
   669	#define TLB_FLUSH_BATCH_PENDING_MASK			\
   670		((1 << (TLB_FLUSH_BATCH_FLUSHED_SHIFT - 1)) - 1)
   671	#define TLB_FLUSH_BATCH_PENDING_LARGE			\
   672		(TLB_FLUSH_BATCH_PENDING_MASK / 2)
   673	
   674	static void set_tlb_ubc_flush_pending(struct mm_struct *mm, pte_t pteval,
   675					      unsigned long uaddr,
   676					      unsigned long size)
   677	{
   678		struct tlbflush_unmap_batch *tlb_ubc = &current->tlb_ubc;
   679		int batch;
   680		bool writable = pte_dirty(pteval);
   681	
   682		if (!pte_accessible(mm, pteval))
   683			return;
   684	
 > 685		arch_tlbbatch_add_pending(&tlb_ubc->arch, mm, uaddr, size);
   686		tlb_ubc->flush_required = true;
   687	
   688		/*
   689		 * Ensure compiler does not re-order the setting of tlb_flush_batched
   690		 * before the PTE is cleared.
   691		 */
   692		barrier();
   693		batch = atomic_read(&mm->tlb_flush_batched);
   694	retry:
   695		if ((batch & TLB_FLUSH_BATCH_PENDING_MASK) > TLB_FLUSH_BATCH_PENDING_LARGE) {
   696			/*
   697			 * Prevent `pending' from catching up with `flushed' because of
   698			 * overflow.  Reset `pending' and `flushed' to be 1 and 0 if
   699			 * `pending' becomes large.
   700			 */
   701			if (!atomic_try_cmpxchg(&mm->tlb_flush_batched, &batch, 1))
   702				goto retry;
   703		} else {
   704			atomic_inc(&mm->tlb_flush_batched);
   705		}
   706	
   707		/*
   708		 * If the PTE was dirty then it's best to assume it's writable. The
   709		 * caller must use try_to_unmap_flush_dirty() or try_to_unmap_flush()
   710		 * before the page is queued for IO.
   711		 */
   712		if (writable)
   713			tlb_ubc->writable = true;
   714	}
   715	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/3] mm: set folio swapbacked iff folios are dirty in try_to_unmap_one
  2025-01-06  9:34       ` Baolin Wang
@ 2025-01-06 14:39         ` Lance Yang
  2025-01-06 20:52           ` Barry Song
  2025-01-07  1:33           ` Lance Yang
  0 siblings, 2 replies; 19+ messages in thread
From: Lance Yang @ 2025-01-06 14:39 UTC (permalink / raw)
  To: Baolin Wang
  Cc: Barry Song, akpm, linux-mm, linux-arm-kernel, x86, linux-kernel,
	david, ryan.roberts, zhengtangquan, ying.huang, kasong, chrisl,
	Barry Song

On Mon, Jan 6, 2025 at 5:34 PM Baolin Wang
<baolin.wang@linux.alibaba.com> wrote:
>
>
>
> On 2025/1/6 17:03, Barry Song wrote:
> > On Mon, Jan 6, 2025 at 7:40 PM Baolin Wang
> > <baolin.wang@linux.alibaba.com> wrote:
> >>
> >>
> >>
> >> On 2025/1/6 11:17, Barry Song wrote:
> >>> From: Barry Song <v-songbaohua@oppo.com>
> >>>
> >>> The refcount may be temporarily or long-term increased, but this does
> >>> not change the fundamental nature of the folio already being lazy-
> >>> freed. Therefore, we only reset 'swapbacked' when we are certain the
> >>> folio is dirty and not droppable.
> >>>
> >>> Suggested-by: David Hildenbrand <david@redhat.com>
> >>> Signed-off-by: Barry Song <v-songbaohua@oppo.com>
> >>
> >> The changes look good to me. While we are at it, could you also change
> >> the __discard_anon_folio_pmd_locked() to follow the same strategy for
> >> lazy-freed PMD-sized folio?
> >
> > it seems you mean __discard_anon_folio_pmd_locked() is lacking
> > folio_set_swapbacked(folio) for dirty pmd-mapped folios?

Good catch!

Hmm... I don't recall why we don't call folio_set_swapbacked for dirty
THPs in __discard_anon_folio_pmd_locked() - possibly to align with
previous behavior ;)

If a dirty PMD-mapped THP cannot be discarded, we just split it and
restart the page walk to process the PTE-mapped THP. After that, we
will only mark each folio within the THP as swap-backed individually.

It seems like we could cut the work by calling folio_set_swapbacked()
for dirty THPs directly in __discard_anon_folio_pmd_locked(), skipping
the restart of the page walk after splitting the THP, IMHO ;)

Thanks,
Lance


> > and it seems !(vma->vm_flags & VM_DROPPABLE) is also not
> > handled properly?


>
> Right.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 0/3] mm: batched unmap lazyfree large folios during reclamation
  2025-01-06  3:17 [PATCH 0/3] mm: batched unmap lazyfree large folios during reclamation Barry Song
                   ` (2 preceding siblings ...)
  2025-01-06  3:17 ` [PATCH 3/3] mm: Support batched unmap for lazyfree large folios during reclamation Barry Song
@ 2025-01-06 17:28 ` Lorenzo Stoakes
  2025-01-06 19:15   ` Barry Song
  3 siblings, 1 reply; 19+ messages in thread
From: Lorenzo Stoakes @ 2025-01-06 17:28 UTC (permalink / raw)
  To: Barry Song
  Cc: akpm, linux-mm, linux-arm-kernel, x86, linux-kernel, ioworker0,
	david, ryan.roberts, zhengtangquan, ying.huang, kasong, chrisl,
	baolin.wang, Barry Song

On Mon, Jan 06, 2025 at 04:17:08PM +1300, Barry Song wrote:
> From: Barry Song <v-songbaohua@oppo.com>
>
> Commit 735ecdfaf4e80 ("mm/vmscan: avoid splitting lazyfree THP during
> shrink_folio_list()") prevents the splitting of MADV_FREE'd THP in madvise.c.
> However, those folios are still added to the deferred_split list in
> try_to_unmap_one() because we are unmapping PTEs and removing rmap entries
> one by one. This approach is not only slow but also increases the risk of a
> race condition where lazyfree folios are incorrectly set back to swapbacked,
> as a speculative folio_get may occur in the shrinker's callback.
>
> This patchset addresses the issue by only marking truly dirty folios as
> swapbacked as suggested by David and shifting to batched unmapping of the
> entire folio in  try_to_unmap_one(). As a result, we've observed
> deferred_split dropping to  zero and significant performance improvements
> in memory reclamation.

You've not provided any numbers? What performance improvements? Under what
workloads?

You're adding a bunch of complexity here, so I feel like we need to see
some numbers, background, etc.?

Thanks!

>
> Barry Song (3):
>   mm: set folio swapbacked iff folios are dirty in try_to_unmap_one
>   mm: Support tlbbatch flush for a range of PTEs
>   mm: Support batched unmap for lazyfree large folios during reclamation
>
>  arch/arm64/include/asm/tlbflush.h |  26 ++++----
>  arch/arm64/mm/contpte.c           |   2 +-
>  arch/x86/include/asm/tlbflush.h   |   3 +-
>  mm/rmap.c                         | 103 ++++++++++++++++++++----------
>  4 files changed, 85 insertions(+), 49 deletions(-)
>
> --
> 2.39.3 (Apple Git-146)
>


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 0/3] mm: batched unmap lazyfree large folios during reclamation
  2025-01-06 17:28 ` [PATCH 0/3] mm: batched unmap " Lorenzo Stoakes
@ 2025-01-06 19:15   ` Barry Song
  0 siblings, 0 replies; 19+ messages in thread
From: Barry Song @ 2025-01-06 19:15 UTC (permalink / raw)
  To: Lorenzo Stoakes
  Cc: akpm, linux-mm, linux-arm-kernel, x86, linux-kernel, ioworker0,
	david, ryan.roberts, zhengtangquan, ying.huang, kasong, chrisl,
	baolin.wang, Barry Song

On Tue, Jan 7, 2025 at 6:28 AM Lorenzo Stoakes
<lorenzo.stoakes@oracle.com> wrote:
>
> On Mon, Jan 06, 2025 at 04:17:08PM +1300, Barry Song wrote:
> > From: Barry Song <v-songbaohua@oppo.com>
> >
> > Commit 735ecdfaf4e80 ("mm/vmscan: avoid splitting lazyfree THP during
> > shrink_folio_list()") prevents the splitting of MADV_FREE'd THP in madvise.c.
> > However, those folios are still added to the deferred_split list in
> > try_to_unmap_one() because we are unmapping PTEs and removing rmap entries
> > one by one. This approach is not only slow but also increases the risk of a
> > race condition where lazyfree folios are incorrectly set back to swapbacked,
> > as a speculative folio_get may occur in the shrinker's callback.
> >
> > This patchset addresses the issue by only marking truly dirty folios as
> > swapbacked as suggested by David and shifting to batched unmapping of the
> > entire folio in  try_to_unmap_one(). As a result, we've observed
> > deferred_split dropping to  zero and significant performance improvements
> > in memory reclamation.
>
> You've not provided any numbers? What performance improvements? Under what
> workloads?

The number can be found in patch 3/3 at the following link:
https://lore.kernel.org/linux-mm/20250106031711.82855-4-21cnbao@gmail.com/

Reclaiming lazyfree mTHP will now be significantly faster.
Additionally, this patch
addresses the issue with the misleading split_deferred counter. The
split_deferred
counter was intended to track operations like unaligned unmap/madvise, but in
practice, the majority of split_deferred cases result from memory reclamation
of aligned lazyfree mTHP. This discrepancy rendered the split_deferred
counter highly
misleading.

>
> You're adding a bunch of complexity here, so I feel like we need to see
> some numbers, background, etc.?

I agree that I can provide more details in v2. In the meantime, you can
find additional background information here:

https://lore.kernel.org/linux-mm/CAGsJ_4wOL6TLa3FKQASdrGfuqqu=14EuxAtpKmnebiGLm0dnfA@mail.gmail.com/

>
> Thanks!
>
> >
> > Barry Song (3):
> >   mm: set folio swapbacked iff folios are dirty in try_to_unmap_one
> >   mm: Support tlbbatch flush for a range of PTEs
> >   mm: Support batched unmap for lazyfree large folios during reclamation
> >
> >  arch/arm64/include/asm/tlbflush.h |  26 ++++----
> >  arch/arm64/mm/contpte.c           |   2 +-
> >  arch/x86/include/asm/tlbflush.h   |   3 +-
> >  mm/rmap.c                         | 103 ++++++++++++++++++++----------
> >  4 files changed, 85 insertions(+), 49 deletions(-)
> >
> > --
> > 2.39.3 (Apple Git-146)

Thanks
Barry


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/3] mm: set folio swapbacked iff folios are dirty in try_to_unmap_one
  2025-01-06 14:39         ` Lance Yang
@ 2025-01-06 20:52           ` Barry Song
  2025-01-06 20:56             ` Barry Song
  2025-01-07  1:33           ` Lance Yang
  1 sibling, 1 reply; 19+ messages in thread
From: Barry Song @ 2025-01-06 20:52 UTC (permalink / raw)
  To: Lance Yang
  Cc: Baolin Wang, akpm, linux-mm, linux-arm-kernel, x86, linux-kernel,
	david, ryan.roberts, zhengtangquan, ying.huang, kasong, chrisl,
	Barry Song

On Tue, Jan 7, 2025 at 3:40 AM Lance Yang <ioworker0@gmail.com> wrote:
>
> On Mon, Jan 6, 2025 at 5:34 PM Baolin Wang
> <baolin.wang@linux.alibaba.com> wrote:
> >
> >
> >
> > On 2025/1/6 17:03, Barry Song wrote:
> > > On Mon, Jan 6, 2025 at 7:40 PM Baolin Wang
> > > <baolin.wang@linux.alibaba.com> wrote:
> > >>
> > >>
> > >>
> > >> On 2025/1/6 11:17, Barry Song wrote:
> > >>> From: Barry Song <v-songbaohua@oppo.com>
> > >>>
> > >>> The refcount may be temporarily or long-term increased, but this does
> > >>> not change the fundamental nature of the folio already being lazy-
> > >>> freed. Therefore, we only reset 'swapbacked' when we are certain the
> > >>> folio is dirty and not droppable.
> > >>>
> > >>> Suggested-by: David Hildenbrand <david@redhat.com>
> > >>> Signed-off-by: Barry Song <v-songbaohua@oppo.com>
> > >>
> > >> The changes look good to me. While we are at it, could you also change
> > >> the __discard_anon_folio_pmd_locked() to follow the same strategy for
> > >> lazy-freed PMD-sized folio?
> > >
> > > it seems you mean __discard_anon_folio_pmd_locked() is lacking
> > > folio_set_swapbacked(folio) for dirty pmd-mapped folios?
>
> Good catch!
>
> Hmm... I don't recall why we don't call folio_set_swapbacked for dirty
> THPs in __discard_anon_folio_pmd_locked() - possibly to align with
> previous behavior ;)
>
> If a dirty PMD-mapped THP cannot be discarded, we just split it and
> restart the page walk to process the PTE-mapped THP. After that, we
> will only mark each folio within the THP as swap-backed individually.
>
> It seems like we could cut the work by calling folio_set_swapbacked()
> for dirty THPs directly in __discard_anon_folio_pmd_locked(), skipping
> the restart of the page walk after splitting the THP, IMHO ;)

Yes, the existing code for PMD-mapped THPs seems quite inefficient. It splits
the PMD-mapped THP into smaller folios, marks each split PTE as dirty, and
then iterates over each PTE. I’m not sure why it’s designed this way—could
there be a specific reason behind this approach?

However, it does appear to handle folio_set_swapbacked() correctly, as only
a dirty PMD will result in dirty PTEs being generated in
__split_huge_pmd_locked():

        } else {
                pte_t entry;

                entry = mk_pte(page, READ_ONCE(vma->vm_page_prot));
                if (write)
                        entry = pte_mkwrite(entry, vma);

                if (!young)
                        entry = pte_mkold(entry);

                /* NOTE: this may set soft-dirty too on some archs */
                if (dirty)
                        entry = pte_mkdirty(entry);

                if (soft_dirty)
                        entry = pte_mksoft_dirty(entry);

                if (uffd_wp)
                        entry = pte_mkuffd_wp(entry);

                for (i = 0; i < HPAGE_PMD_NR; i++)
                        VM_WARN_ON(!pte_none(ptep_get(pte + i)));

                set_ptes(mm, haddr, pte, entry, HPAGE_PMD_NR);
        }



>
> Thanks,
> Lance
>
>
> > > and it seems !(vma->vm_flags & VM_DROPPABLE) is also not
> > > handled properly?
>
>
> >
> > Right.

Thanks
Barry


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/3] mm: set folio swapbacked iff folios are dirty in try_to_unmap_one
  2025-01-06 20:52           ` Barry Song
@ 2025-01-06 20:56             ` Barry Song
  0 siblings, 0 replies; 19+ messages in thread
From: Barry Song @ 2025-01-06 20:56 UTC (permalink / raw)
  To: Lance Yang
  Cc: Baolin Wang, akpm, linux-mm, linux-arm-kernel, x86, linux-kernel,
	david, ryan.roberts, zhengtangquan, kasong, chrisl, Barry Song,
	Ying Huang

On Tue, Jan 7, 2025 at 9:52 AM Barry Song <21cnbao@gmail.com> wrote:
>
> On Tue, Jan 7, 2025 at 3:40 AM Lance Yang <ioworker0@gmail.com> wrote:
> >
> > On Mon, Jan 6, 2025 at 5:34 PM Baolin Wang
> > <baolin.wang@linux.alibaba.com> wrote:
> > >
> > >
> > >
> > > On 2025/1/6 17:03, Barry Song wrote:
> > > > On Mon, Jan 6, 2025 at 7:40 PM Baolin Wang
> > > > <baolin.wang@linux.alibaba.com> wrote:
> > > >>
> > > >>
> > > >>
> > > >> On 2025/1/6 11:17, Barry Song wrote:
> > > >>> From: Barry Song <v-songbaohua@oppo.com>
> > > >>>
> > > >>> The refcount may be temporarily or long-term increased, but this does
> > > >>> not change the fundamental nature of the folio already being lazy-
> > > >>> freed. Therefore, we only reset 'swapbacked' when we are certain the
> > > >>> folio is dirty and not droppable.
> > > >>>
> > > >>> Suggested-by: David Hildenbrand <david@redhat.com>
> > > >>> Signed-off-by: Barry Song <v-songbaohua@oppo.com>
> > > >>
> > > >> The changes look good to me. While we are at it, could you also change
> > > >> the __discard_anon_folio_pmd_locked() to follow the same strategy for
> > > >> lazy-freed PMD-sized folio?
> > > >
> > > > it seems you mean __discard_anon_folio_pmd_locked() is lacking
> > > > folio_set_swapbacked(folio) for dirty pmd-mapped folios?
> >
> > Good catch!
> >
> > Hmm... I don't recall why we don't call folio_set_swapbacked for dirty
> > THPs in __discard_anon_folio_pmd_locked() - possibly to align with
> > previous behavior ;)
> >
> > If a dirty PMD-mapped THP cannot be discarded, we just split it and
> > restart the page walk to process the PTE-mapped THP. After that, we
> > will only mark each folio within the THP as swap-backed individually.
> >
> > It seems like we could cut the work by calling folio_set_swapbacked()
> > for dirty THPs directly in __discard_anon_folio_pmd_locked(), skipping
> > the restart of the page walk after splitting the THP, IMHO ;)
>
> Yes, the existing code for PMD-mapped THPs seems quite inefficient. It splits
> the PMD-mapped THP into smaller folios, marks each split PTE as dirty, and

Apologies for the typo, I meant splitting a PMD-mapped THP into a PTE-mapped
THP.

> then iterates over each PTE. I’m not sure why it’s designed this way—could
> there be a specific reason behind this approach?
>
> However, it does appear to handle folio_set_swapbacked() correctly, as only
> a dirty PMD will result in dirty PTEs being generated in
> __split_huge_pmd_locked():
>
>         } else {
>                 pte_t entry;
>
>                 entry = mk_pte(page, READ_ONCE(vma->vm_page_prot));
>                 if (write)
>                         entry = pte_mkwrite(entry, vma);
>
>                 if (!young)
>                         entry = pte_mkold(entry);
>
>                 /* NOTE: this may set soft-dirty too on some archs */
>                 if (dirty)
>                         entry = pte_mkdirty(entry);
>
>                 if (soft_dirty)
>                         entry = pte_mksoft_dirty(entry);
>
>                 if (uffd_wp)
>                         entry = pte_mkuffd_wp(entry);
>
>                 for (i = 0; i < HPAGE_PMD_NR; i++)
>                         VM_WARN_ON(!pte_none(ptep_get(pte + i)));
>
>                 set_ptes(mm, haddr, pte, entry, HPAGE_PMD_NR);
>         }
>
>
>
> >
> > Thanks,
> > Lance
> >
> >
> > > > and it seems !(vma->vm_flags & VM_DROPPABLE) is also not
> > > > handled properly?
> >
> >
> > >
> > > Right.
>
> Thanks
> Barry


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/3] mm: set folio swapbacked iff folios are dirty in try_to_unmap_one
  2025-01-06 14:39         ` Lance Yang
  2025-01-06 20:52           ` Barry Song
@ 2025-01-07  1:33           ` Lance Yang
  1 sibling, 0 replies; 19+ messages in thread
From: Lance Yang @ 2025-01-07  1:33 UTC (permalink / raw)
  To: Baolin Wang
  Cc: Barry Song, akpm, linux-mm, linux-arm-kernel, x86, linux-kernel,
	david, ryan.roberts, zhengtangquan, ying.huang, kasong, chrisl,
	Barry Song

On Mon, Jan 6, 2025 at 10:39 PM Lance Yang <ioworker0@gmail.com> wrote:
>
> On Mon, Jan 6, 2025 at 5:34 PM Baolin Wang
> <baolin.wang@linux.alibaba.com> wrote:
> >
> >
> >
> > On 2025/1/6 17:03, Barry Song wrote:
> > > On Mon, Jan 6, 2025 at 7:40 PM Baolin Wang
> > > <baolin.wang@linux.alibaba.com> wrote:
> > >>
> > >>
> > >>
> > >> On 2025/1/6 11:17, Barry Song wrote:
> > >>> From: Barry Song <v-songbaohua@oppo.com>
> > >>>
> > >>> The refcount may be temporarily or long-term increased, but this does
> > >>> not change the fundamental nature of the folio already being lazy-
> > >>> freed. Therefore, we only reset 'swapbacked' when we are certain the
> > >>> folio is dirty and not droppable.
> > >>>
> > >>> Suggested-by: David Hildenbrand <david@redhat.com>
> > >>> Signed-off-by: Barry Song <v-songbaohua@oppo.com>
> > >>
> > >> The changes look good to me. While we are at it, could you also change
> > >> the __discard_anon_folio_pmd_locked() to follow the same strategy for
> > >> lazy-freed PMD-sized folio?
> > >
> > > it seems you mean __discard_anon_folio_pmd_locked() is lacking
> > > folio_set_swapbacked(folio) for dirty pmd-mapped folios?
>
> Good catch!
>
> Hmm... I don't recall why we don't call folio_set_swapbacked for dirty
> THPs in __discard_anon_folio_pmd_locked() - possibly to align with
> previous behavior ;)
>
> If a dirty PMD-mapped THP cannot be discarded, we just split it and
> restart the page walk to process the PTE-mapped THP. After that, we
> will only mark each folio within the THP as swap-backed individually.
>
> It seems like we could cut the work by calling folio_set_swapbacked()
> for dirty THPs directly in __discard_anon_folio_pmd_locked(), skipping
> the restart of the page walk after splitting the THP, IMHO ;)

In correction to the earlier email:

folio_set_swapbacked() is only called in __discard_anon_folio_pmd_locked()
when '!(vma->vm_flags & VM_DROPPABLE)' is true, IIUC.

Thanks,
Lance


>
> Thanks,
> Lance
>
>
> > > and it seems !(vma->vm_flags & VM_DROPPABLE) is also not
> > > handled properly?
>
>
> >
> > Right.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/3] mm: Support tlbbatch flush for a range of PTEs
  2025-01-06  8:22   ` kernel test robot
@ 2025-01-13  0:55     ` Barry Song
  2025-01-13 13:13       ` Oliver Sang
  0 siblings, 1 reply; 19+ messages in thread
From: Barry Song @ 2025-01-13  0:55 UTC (permalink / raw)
  To: lkp
  Cc: 21cnbao, akpm, anshuman.khandual, baolin.wang, bp,
	catalin.marinas, chrisl, dave.hansen, david, gshan, hpa,
	ioworker0, kasong, kirill.shutemov, linux-arm-kernel,
	linux-kernel, linux-mm, mark.rutland, mingo, oe-kbuild-all,
	ryan.roberts, shahuang, tglx, v-songbaohua, wangkefeng.wang,
	will, x86, ying.huang, yosryahmed, zhengtangquan

On Mon, Jan 6, 2025 at 9:23 PM kernel test robot <lkp@intel.com> wrote:
>
> Hi Barry,
>
> kernel test robot noticed the following build errors:
>
> [auto build test ERROR on akpm-mm/mm-everything]
>
> url:    https://github.com/intel-lab-lkp/linux/commits/Barry-Song/mm-set-folio-swapbacked-iff-folios-are-dirty-in-try_to_unmap_one/20250106-112638
> base:   https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-everything
> patch link:    https://lore.kernel.org/r/20250106031711.82855-3-21cnbao%40gmail.com
> patch subject: [PATCH 2/3] mm: Support tlbbatch flush for a range of PTEs
> config: i386-buildonly-randconfig-002-20250106 (https://download.01.org/0day-ci/archive/20250106/202501061535.zx9E486H-lkp@intel.com/config)
> compiler: gcc-12 (Debian 12.2.0-14) 12.2.0
> reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250106/202501061535.zx9E486H-lkp@intel.com/reproduce)
>
> If you fix the issue in a separate patch/commit (i.e. not just a new version of
> the same patch/commit), kindly add following tags
> | Reported-by: kernel test robot <lkp@intel.com>
> | Closes: https://lore.kernel.org/oe-kbuild-all/202501061535.zx9E486H-lkp@intel.com/

Sorry. My bad, does the below fix the build?

diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h
index cda35f53f544..4b62a6329b8f 100644
--- a/arch/x86/include/asm/tlbflush.h
+++ b/arch/x86/include/asm/tlbflush.h
@@ -280,7 +280,7 @@ static inline u64 inc_mm_tlb_gen(struct mm_struct *mm)
 static inline void arch_tlbbatch_add_pending(struct arch_tlbflush_unmap_batch *batch,
 					     struct mm_struct *mm,
 					     unsigned long uaddr,
-					     unsignd long size)
+					     unsigned long size)
 {
 	inc_mm_tlb_gen(mm);
 	cpumask_or(&batch->cpumask, &batch->cpumask, mm_cpumask(mm));

>
> All errors (new ones prefixed by >>):
>
>    In file included from arch/x86/include/asm/uaccess.h:17,
>                     from include/linux/uaccess.h:12,
>                     from include/linux/sched/task.h:13,
>                     from include/linux/sched/signal.h:9,
>                     from include/linux/rcuwait.h:6,
>                     from include/linux/percpu-rwsem.h:7,
>                     from include/linux/fs.h:33,
>                     from include/linux/cgroup.h:17,
>                     from include/linux/memcontrol.h:13,
>                     from include/linux/swap.h:9,
>                     from include/linux/suspend.h:5,
>                     from arch/x86/kernel/asm-offsets.c:14:
> >> arch/x86/include/asm/tlbflush.h:283:46: error: unknown type name 'unsignd'; did you mean 'unsigned'?
>      283 |                                              unsignd long size)
>          |                                              ^~~~~~~
>          |                                              unsigned
>    make[3]: *** [scripts/Makefile.build:102: arch/x86/kernel/asm-offsets.s] Error 1 shuffle=998720002
>    make[3]: Target 'prepare' not remade because of errors.
>    make[2]: *** [Makefile:1263: prepare0] Error 2 shuffle=998720002
>    make[2]: Target 'prepare' not remade because of errors.
>    make[1]: *** [Makefile:251: __sub-make] Error 2 shuffle=998720002
>    make[1]: Target 'prepare' not remade because of errors.
>    make: *** [Makefile:251: __sub-make] Error 2 shuffle=998720002
>    make: Target 'prepare' not remade because of errors.
>
>
> vim +283 arch/x86/include/asm/tlbflush.h
>
>    279 
>    280  static inline void arch_tlbbatch_add_pending(struct arch_tlbflush_unmap_batch *batch,
>    281                                               struct mm_struct *mm,
>    282                                               unsigned long uaddr,
>  > 283                                               unsignd long size)
>    284  {
>    285          inc_mm_tlb_gen(mm);
>    286          cpumask_or(&batch->cpumask, &batch->cpumask, mm_cpumask(mm));
>    287          mmu_notifier_arch_invalidate_secondary_tlbs(mm, 0, -1UL);
>    288  }
>    289 
>
> --
> 0-DAY CI Kernel Test Service
> https://github.com/intel/lkp-tests/wiki


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/3] mm: Support tlbbatch flush for a range of PTEs
  2025-01-06 10:07   ` kernel test robot
@ 2025-01-13  0:56     ` Barry Song
  2025-01-13  7:30       ` Oliver Sang
  0 siblings, 1 reply; 19+ messages in thread
From: Barry Song @ 2025-01-13  0:56 UTC (permalink / raw)
  To: lkp
  Cc: 21cnbao, akpm, anshuman.khandual, baolin.wang, bp,
	catalin.marinas, chrisl, dave.hansen, david, gshan, hpa,
	ioworker0, kasong, kirill.shutemov, linux-arm-kernel,
	linux-kernel, linux-mm, mark.rutland, mingo, oe-kbuild-all,
	ryan.roberts, shahuang, tglx, v-songbaohua, wangkefeng.wang,
	will, x86, ying.huang, yosryahmed, zhengtangquan

On Mon, Jan 6, 2025 at 11:08 PM kernel test robot <lkp@intel.com> wrote:
>
> Hi Barry,
>
> kernel test robot noticed the following build errors:
>
> [auto build test ERROR on akpm-mm/mm-everything]
>
> url:    https://github.com/intel-lab-lkp/linux/commits/Barry-Song/mm-set-folio-swapbacked-iff-folios-are-dirty-in-try_to_unmap_one/20250106-112638
> base:   https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-everything
> patch link:    https://lore.kernel.org/r/20250106031711.82855-3-21cnbao%40gmail.com
> patch subject: [PATCH 2/3] mm: Support tlbbatch flush for a range of PTEs
> config: riscv-randconfig-001-20250106 (https://download.01.org/0day-ci/archive/20250106/202501061736.FoHcInHJ-lkp@intel.com/config)
> compiler: riscv64-linux-gcc (GCC) 14.2.0
> reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250106/202501061736.FoHcInHJ-lkp@intel.com/reproduce)
>
> If you fix the issue in a separate patch/commit (i.e. not just a new version of
> the same patch/commit), kindly add following tags
> | Reported-by: kernel test robot <lkp@intel.com>
> | Closes: https://lore.kernel.org/oe-kbuild-all/202501061736.FoHcInHJ-lkp@intel.com/
>

Sorry. My bad, does the below diff fix the build?

diff --git a/arch/riscv/include/asm/tlbflush.h b/arch/riscv/include/asm/tlbflush.h
index 72e559934952..7f3ea687ce33 100644
--- a/arch/riscv/include/asm/tlbflush.h
+++ b/arch/riscv/include/asm/tlbflush.h
@@ -61,7 +61,8 @@ void flush_pmd_tlb_range(struct vm_area_struct *vma, unsigned long start,
 bool arch_tlbbatch_should_defer(struct mm_struct *mm);
 void arch_tlbbatch_add_pending(struct arch_tlbflush_unmap_batch *batch,
 			       struct mm_struct *mm,
-			       unsigned long uaddr);
+			       unsigned long uaddr,
+			       unsigned long size);
 void arch_flush_tlb_batched_pending(struct mm_struct *mm);
 void arch_tlbbatch_flush(struct arch_tlbflush_unmap_batch *batch);
 
diff --git a/arch/riscv/mm/tlbflush.c b/arch/riscv/mm/tlbflush.c
index 9b6e86ce3867..aeda64a36d50 100644
--- a/arch/riscv/mm/tlbflush.c
+++ b/arch/riscv/mm/tlbflush.c
@@ -187,7 +187,8 @@ bool arch_tlbbatch_should_defer(struct mm_struct *mm)
 
 void arch_tlbbatch_add_pending(struct arch_tlbflush_unmap_batch *batch,
 			       struct mm_struct *mm,
-			       unsigned long uaddr)
+			       unsigned long uaddr,
+			       unsigned long size)
 {
 	cpumask_or(&batch->cpumask, &batch->cpumask, mm_cpumask(mm));
 }

> All errors (new ones prefixed by >>):
>
>    mm/rmap.c: In function 'set_tlb_ubc_flush_pending':
> >> mm/rmap.c:685:9: error: too many arguments to function 'arch_tlbbatch_add_pending'
>      685 |         arch_tlbbatch_add_pending(&tlb_ubc->arch, mm, uaddr, size);
>          |         ^~~~~~~~~~~~~~~~~~~~~~~~~
>    In file included from arch/riscv/include/asm/pgtable.h:113,
>                     from include/linux/pgtable.h:6,
>                     from include/linux/mm.h:30,
>                     from mm/rmap.c:55:
>    arch/riscv/include/asm/tlbflush.h:62:6: note: declared here
>       62 | void arch_tlbbatch_add_pending(struct arch_tlbflush_unmap_batch *batch,
>          |      ^~~~~~~~~~~~~~~~~~~~~~~~~
>
>
> vim +/arch_tlbbatch_add_pending +685 mm/rmap.c
>
>    663 
>    664  /*
>    665   * Bits 0-14 of mm->tlb_flush_batched record pending generations.
>    666   * Bits 16-30 of mm->tlb_flush_batched bit record flushed generations.
>    667   */
>    668  #define TLB_FLUSH_BATCH_FLUSHED_SHIFT   16
>    669  #define TLB_FLUSH_BATCH_PENDING_MASK                    \
>    670          ((1 << (TLB_FLUSH_BATCH_FLUSHED_SHIFT - 1)) - 1)
>    671  #define TLB_FLUSH_BATCH_PENDING_LARGE                   \
>    672          (TLB_FLUSH_BATCH_PENDING_MASK / 2)
>    673 
>    674  static void set_tlb_ubc_flush_pending(struct mm_struct *mm, pte_t pteval,
>    675                                        unsigned long uaddr,
>    676                                        unsigned long size)
>    677  {
>    678          struct tlbflush_unmap_batch *tlb_ubc = &current->tlb_ubc;
>    679          int batch;
>    680          bool writable = pte_dirty(pteval);
>    681 
>    682          if (!pte_accessible(mm, pteval))
>    683                  return;
>    684 
>  > 685          arch_tlbbatch_add_pending(&tlb_ubc->arch, mm, uaddr, size);
>    686          tlb_ubc->flush_required = true;
>    687 
>    688          /*
>    689           * Ensure compiler does not re-order the setting of tlb_flush_batched
>    690           * before the PTE is cleared.
>    691           */
>    692          barrier();
>    693          batch = atomic_read(&mm->tlb_flush_batched);
>    694  retry:
>    695          if ((batch & TLB_FLUSH_BATCH_PENDING_MASK) > TLB_FLUSH_BATCH_PENDING_LARGE) {
>    696                  /*
>    697                   * Prevent `pending' from catching up with `flushed' because of
>    698                   * overflow.  Reset `pending' and `flushed' to be 1 and 0 if
>    699                   * `pending' becomes large.
>    700                   */
>    701                  if (!atomic_try_cmpxchg(&mm->tlb_flush_batched, &batch, 1))
>    702                          goto retry;
>    703          } else {
>    704                  atomic_inc(&mm->tlb_flush_batched);
>    705          }
>    706 
>    707          /*
>    708           * If the PTE was dirty then it's best to assume it's writable. The
>    709           * caller must use try_to_unmap_flush_dirty() or try_to_unmap_flush()
>    710           * before the page is queued for IO.
>    711           */
>    712          if (writable)
>    713                  tlb_ubc->writable = true;
>    714  }
>    715 
>
> --
> 0-DAY CI Kernel Test Service
> https://github.com/intel/lkp-tests/wiki


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/3] mm: Support tlbbatch flush for a range of PTEs
  2025-01-13  0:56     ` Barry Song
@ 2025-01-13  7:30       ` Oliver Sang
  0 siblings, 0 replies; 19+ messages in thread
From: Oliver Sang @ 2025-01-13  7:30 UTC (permalink / raw)
  To: Barry Song
  Cc: lkp, akpm, anshuman.khandual, baolin.wang, bp, catalin.marinas,
	chrisl, dave.hansen, david, gshan, hpa, ioworker0, kasong,
	kirill.shutemov, linux-arm-kernel, linux-kernel, linux-mm,
	mark.rutland, mingo, oe-kbuild-all, ryan.roberts, shahuang, tglx,
	v-songbaohua, wangkefeng.wang, will, x86, ying.huang, yosryahmed,
	zhengtangquan, oliver.sang

hi, Barry,

On Mon, Jan 13, 2025 at 01:56:26PM +1300, Barry Song wrote:
> On Mon, Jan 6, 2025 at 11:08 PM kernel test robot <lkp@intel.com> wrote:
> >
> > Hi Barry,
> >
> > kernel test robot noticed the following build errors:
> >
> > [auto build test ERROR on akpm-mm/mm-everything]
> >
> > url:    https://github.com/intel-lab-lkp/linux/commits/Barry-Song/mm-set-folio-swapbacked-iff-folios-are-dirty-in-try_to_unmap_one/20250106-112638
> > base:   https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-everything
> > patch link:    https://lore.kernel.org/r/20250106031711.82855-3-21cnbao%40gmail.com
> > patch subject: [PATCH 2/3] mm: Support tlbbatch flush for a range of PTEs
> > config: riscv-randconfig-001-20250106 (https://download.01.org/0day-ci/archive/20250106/202501061736.FoHcInHJ-lkp@intel.com/config)
> > compiler: riscv64-linux-gcc (GCC) 14.2.0
> > reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250106/202501061736.FoHcInHJ-lkp@intel.com/reproduce)
> >
> > If you fix the issue in a separate patch/commit (i.e. not just a new version of
> > the same patch/commit), kindly add following tags
> > | Reported-by: kernel test robot <lkp@intel.com>
> > | Closes: https://lore.kernel.org/oe-kbuild-all/202501061736.FoHcInHJ-lkp@intel.com/
> >
> 
> Sorry. My bad, does the below diff fix the build?

yes, below diff fixes the build. thanks

Tested-by: kernel test robot <oliver.sang@intel.com>

> 
> diff --git a/arch/riscv/include/asm/tlbflush.h b/arch/riscv/include/asm/tlbflush.h
> index 72e559934952..7f3ea687ce33 100644
> --- a/arch/riscv/include/asm/tlbflush.h
> +++ b/arch/riscv/include/asm/tlbflush.h
> @@ -61,7 +61,8 @@ void flush_pmd_tlb_range(struct vm_area_struct *vma, unsigned long start,
>  bool arch_tlbbatch_should_defer(struct mm_struct *mm);
>  void arch_tlbbatch_add_pending(struct arch_tlbflush_unmap_batch *batch,
>  			       struct mm_struct *mm,
> -			       unsigned long uaddr);
> +			       unsigned long uaddr,
> +			       unsigned long size);
>  void arch_flush_tlb_batched_pending(struct mm_struct *mm);
>  void arch_tlbbatch_flush(struct arch_tlbflush_unmap_batch *batch);
>  
> diff --git a/arch/riscv/mm/tlbflush.c b/arch/riscv/mm/tlbflush.c
> index 9b6e86ce3867..aeda64a36d50 100644
> --- a/arch/riscv/mm/tlbflush.c
> +++ b/arch/riscv/mm/tlbflush.c
> @@ -187,7 +187,8 @@ bool arch_tlbbatch_should_defer(struct mm_struct *mm)
>  
>  void arch_tlbbatch_add_pending(struct arch_tlbflush_unmap_batch *batch,
>  			       struct mm_struct *mm,
> -			       unsigned long uaddr)
> +			       unsigned long uaddr,
> +			       unsigned long size)
>  {
>  	cpumask_or(&batch->cpumask, &batch->cpumask, mm_cpumask(mm));
>  }
> 
> > All errors (new ones prefixed by >>):
> >
> >    mm/rmap.c: In function 'set_tlb_ubc_flush_pending':
> > >> mm/rmap.c:685:9: error: too many arguments to function 'arch_tlbbatch_add_pending'
> >      685 |         arch_tlbbatch_add_pending(&tlb_ubc->arch, mm, uaddr, size);
> >          |         ^~~~~~~~~~~~~~~~~~~~~~~~~
> >    In file included from arch/riscv/include/asm/pgtable.h:113,
> >                     from include/linux/pgtable.h:6,
> >                     from include/linux/mm.h:30,
> >                     from mm/rmap.c:55:
> >    arch/riscv/include/asm/tlbflush.h:62:6: note: declared here
> >       62 | void arch_tlbbatch_add_pending(struct arch_tlbflush_unmap_batch *batch,
> >          |      ^~~~~~~~~~~~~~~~~~~~~~~~~
> >
> >
> > vim +/arch_tlbbatch_add_pending +685 mm/rmap.c
> >
> >    663 
> >    664  /*
> >    665   * Bits 0-14 of mm->tlb_flush_batched record pending generations.
> >    666   * Bits 16-30 of mm->tlb_flush_batched bit record flushed generations.
> >    667   */
> >    668  #define TLB_FLUSH_BATCH_FLUSHED_SHIFT   16
> >    669  #define TLB_FLUSH_BATCH_PENDING_MASK                    \
> >    670          ((1 << (TLB_FLUSH_BATCH_FLUSHED_SHIFT - 1)) - 1)
> >    671  #define TLB_FLUSH_BATCH_PENDING_LARGE                   \
> >    672          (TLB_FLUSH_BATCH_PENDING_MASK / 2)
> >    673 
> >    674  static void set_tlb_ubc_flush_pending(struct mm_struct *mm, pte_t pteval,
> >    675                                        unsigned long uaddr,
> >    676                                        unsigned long size)
> >    677  {
> >    678          struct tlbflush_unmap_batch *tlb_ubc = &current->tlb_ubc;
> >    679          int batch;
> >    680          bool writable = pte_dirty(pteval);
> >    681 
> >    682          if (!pte_accessible(mm, pteval))
> >    683                  return;
> >    684 
> >  > 685          arch_tlbbatch_add_pending(&tlb_ubc->arch, mm, uaddr, size);
> >    686          tlb_ubc->flush_required = true;
> >    687 
> >    688          /*
> >    689           * Ensure compiler does not re-order the setting of tlb_flush_batched
> >    690           * before the PTE is cleared.
> >    691           */
> >    692          barrier();
> >    693          batch = atomic_read(&mm->tlb_flush_batched);
> >    694  retry:
> >    695          if ((batch & TLB_FLUSH_BATCH_PENDING_MASK) > TLB_FLUSH_BATCH_PENDING_LARGE) {
> >    696                  /*
> >    697                   * Prevent `pending' from catching up with `flushed' because of
> >    698                   * overflow.  Reset `pending' and `flushed' to be 1 and 0 if
> >    699                   * `pending' becomes large.
> >    700                   */
> >    701                  if (!atomic_try_cmpxchg(&mm->tlb_flush_batched, &batch, 1))
> >    702                          goto retry;
> >    703          } else {
> >    704                  atomic_inc(&mm->tlb_flush_batched);
> >    705          }
> >    706 
> >    707          /*
> >    708           * If the PTE was dirty then it's best to assume it's writable. The
> >    709           * caller must use try_to_unmap_flush_dirty() or try_to_unmap_flush()
> >    710           * before the page is queued for IO.
> >    711           */
> >    712          if (writable)
> >    713                  tlb_ubc->writable = true;
> >    714  }
> >    715 
> >
> > --
> > 0-DAY CI Kernel Test Service
> > https://github.com/intel/lkp-tests/wiki
> 


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/3] mm: Support tlbbatch flush for a range of PTEs
  2025-01-13  0:55     ` Barry Song
@ 2025-01-13 13:13       ` Oliver Sang
  0 siblings, 0 replies; 19+ messages in thread
From: Oliver Sang @ 2025-01-13 13:13 UTC (permalink / raw)
  To: Barry Song
  Cc: lkp, akpm, anshuman.khandual, baolin.wang, bp, catalin.marinas,
	chrisl, dave.hansen, david, gshan, hpa, ioworker0, kasong,
	kirill.shutemov, linux-arm-kernel, linux-kernel, linux-mm,
	mark.rutland, mingo, oe-kbuild-all, ryan.roberts, shahuang, tglx,
	v-songbaohua, wangkefeng.wang, will, x86, ying.huang, yosryahmed,
	zhengtangquan, oliver.sang

hi, Barry,

On Mon, Jan 13, 2025 at 01:55:04PM +1300, Barry Song wrote:
> On Mon, Jan 6, 2025 at 9:23 PM kernel test robot <lkp@intel.com> wrote:
> >
> > Hi Barry,
> >
> > kernel test robot noticed the following build errors:
> >
> > [auto build test ERROR on akpm-mm/mm-everything]
> >
> > url:    https://github.com/intel-lab-lkp/linux/commits/Barry-Song/mm-set-folio-swapbacked-iff-folios-are-dirty-in-try_to_unmap_one/20250106-112638
> > base:   https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-everything
> > patch link:    https://lore.kernel.org/r/20250106031711.82855-3-21cnbao%40gmail.com
> > patch subject: [PATCH 2/3] mm: Support tlbbatch flush for a range of PTEs
> > config: i386-buildonly-randconfig-002-20250106 (https://download.01.org/0day-ci/archive/20250106/202501061535.zx9E486H-lkp@intel.com/config)
> > compiler: gcc-12 (Debian 12.2.0-14) 12.2.0
> > reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250106/202501061535.zx9E486H-lkp@intel.com/reproduce)
> >
> > If you fix the issue in a separate patch/commit (i.e. not just a new version of
> > the same patch/commit), kindly add following tags
> > | Reported-by: kernel test robot <lkp@intel.com>
> > | Closes: https://lore.kernel.org/oe-kbuild-all/202501061535.zx9E486H-lkp@intel.com/
> 
> Sorry. My bad, does the below fix the build?

yes, below diff fixes the build. thanks

Tested-by: kernel test robot <oliver.sang@intel.com>

> 
> diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h
> index cda35f53f544..4b62a6329b8f 100644
> --- a/arch/x86/include/asm/tlbflush.h
> +++ b/arch/x86/include/asm/tlbflush.h
> @@ -280,7 +280,7 @@ static inline u64 inc_mm_tlb_gen(struct mm_struct *mm)
>  static inline void arch_tlbbatch_add_pending(struct arch_tlbflush_unmap_batch *batch,
>  					     struct mm_struct *mm,
>  					     unsigned long uaddr,
> -					     unsignd long size)
> +					     unsigned long size)
>  {
>  	inc_mm_tlb_gen(mm);
>  	cpumask_or(&batch->cpumask, &batch->cpumask, mm_cpumask(mm));
> 
> >
> > All errors (new ones prefixed by >>):
> >
> >    In file included from arch/x86/include/asm/uaccess.h:17,
> >                     from include/linux/uaccess.h:12,
> >                     from include/linux/sched/task.h:13,
> >                     from include/linux/sched/signal.h:9,
> >                     from include/linux/rcuwait.h:6,
> >                     from include/linux/percpu-rwsem.h:7,
> >                     from include/linux/fs.h:33,
> >                     from include/linux/cgroup.h:17,
> >                     from include/linux/memcontrol.h:13,
> >                     from include/linux/swap.h:9,
> >                     from include/linux/suspend.h:5,
> >                     from arch/x86/kernel/asm-offsets.c:14:
> > >> arch/x86/include/asm/tlbflush.h:283:46: error: unknown type name 'unsignd'; did you mean 'unsigned'?
> >      283 |                                              unsignd long size)
> >          |                                              ^~~~~~~
> >          |                                              unsigned
> >    make[3]: *** [scripts/Makefile.build:102: arch/x86/kernel/asm-offsets.s] Error 1 shuffle=998720002
> >    make[3]: Target 'prepare' not remade because of errors.
> >    make[2]: *** [Makefile:1263: prepare0] Error 2 shuffle=998720002
> >    make[2]: Target 'prepare' not remade because of errors.
> >    make[1]: *** [Makefile:251: __sub-make] Error 2 shuffle=998720002
> >    make[1]: Target 'prepare' not remade because of errors.
> >    make: *** [Makefile:251: __sub-make] Error 2 shuffle=998720002
> >    make: Target 'prepare' not remade because of errors.
> >
> >
> > vim +283 arch/x86/include/asm/tlbflush.h
> >
> >    279 
> >    280  static inline void arch_tlbbatch_add_pending(struct arch_tlbflush_unmap_batch *batch,
> >    281                                               struct mm_struct *mm,
> >    282                                               unsigned long uaddr,
> >  > 283                                               unsignd long size)
> >    284  {
> >    285          inc_mm_tlb_gen(mm);
> >    286          cpumask_or(&batch->cpumask, &batch->cpumask, mm_cpumask(mm));
> >    287          mmu_notifier_arch_invalidate_secondary_tlbs(mm, 0, -1UL);
> >    288  }
> >    289 
> >
> > --
> > 0-DAY CI Kernel Test Service
> > https://github.com/intel/lkp-tests/wiki


^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2025-01-13 13:13 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-01-06  3:17 [PATCH 0/3] mm: batched unmap lazyfree large folios during reclamation Barry Song
2025-01-06  3:17 ` [PATCH 1/3] mm: set folio swapbacked iff folios are dirty in try_to_unmap_one Barry Song
2025-01-06  6:40   ` Baolin Wang
2025-01-06  9:03     ` Barry Song
2025-01-06  9:34       ` Baolin Wang
2025-01-06 14:39         ` Lance Yang
2025-01-06 20:52           ` Barry Song
2025-01-06 20:56             ` Barry Song
2025-01-07  1:33           ` Lance Yang
2025-01-06  3:17 ` [PATCH 2/3] mm: Support tlbbatch flush for a range of PTEs Barry Song
2025-01-06  8:22   ` kernel test robot
2025-01-13  0:55     ` Barry Song
2025-01-13 13:13       ` Oliver Sang
2025-01-06 10:07   ` kernel test robot
2025-01-13  0:56     ` Barry Song
2025-01-13  7:30       ` Oliver Sang
2025-01-06  3:17 ` [PATCH 3/3] mm: Support batched unmap for lazyfree large folios during reclamation Barry Song
2025-01-06 17:28 ` [PATCH 0/3] mm: batched unmap " Lorenzo Stoakes
2025-01-06 19:15   ` Barry Song

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox