* [PATCH 0/3] mm: batched unmap lazyfree large folios during reclamation
@ 2025-01-06 3:17 Barry Song
2025-01-06 3:17 ` [PATCH 1/3] mm: set folio swapbacked iff folios are dirty in try_to_unmap_one Barry Song
` (3 more replies)
0 siblings, 4 replies; 19+ messages in thread
From: Barry Song @ 2025-01-06 3:17 UTC (permalink / raw)
To: akpm, linux-mm
Cc: linux-arm-kernel, x86, linux-kernel, ioworker0, david,
ryan.roberts, zhengtangquan, ying.huang, kasong, chrisl,
baolin.wang, Barry Song
From: Barry Song <v-songbaohua@oppo.com>
Commit 735ecdfaf4e80 ("mm/vmscan: avoid splitting lazyfree THP during
shrink_folio_list()") prevents the splitting of MADV_FREE'd THP in madvise.c.
However, those folios are still added to the deferred_split list in
try_to_unmap_one() because we are unmapping PTEs and removing rmap entries
one by one. This approach is not only slow but also increases the risk of a
race condition where lazyfree folios are incorrectly set back to swapbacked,
as a speculative folio_get may occur in the shrinker's callback.
This patchset addresses the issue by only marking truly dirty folios as
swapbacked as suggested by David and shifting to batched unmapping of the
entire folio in try_to_unmap_one(). As a result, we've observed
deferred_split dropping to zero and significant performance improvements
in memory reclamation.
Barry Song (3):
mm: set folio swapbacked iff folios are dirty in try_to_unmap_one
mm: Support tlbbatch flush for a range of PTEs
mm: Support batched unmap for lazyfree large folios during reclamation
arch/arm64/include/asm/tlbflush.h | 26 ++++----
arch/arm64/mm/contpte.c | 2 +-
arch/x86/include/asm/tlbflush.h | 3 +-
mm/rmap.c | 103 ++++++++++++++++++++----------
4 files changed, 85 insertions(+), 49 deletions(-)
--
2.39.3 (Apple Git-146)
^ permalink raw reply [flat|nested] 19+ messages in thread
* [PATCH 1/3] mm: set folio swapbacked iff folios are dirty in try_to_unmap_one
2025-01-06 3:17 [PATCH 0/3] mm: batched unmap lazyfree large folios during reclamation Barry Song
@ 2025-01-06 3:17 ` Barry Song
2025-01-06 6:40 ` Baolin Wang
2025-01-06 3:17 ` [PATCH 2/3] mm: Support tlbbatch flush for a range of PTEs Barry Song
` (2 subsequent siblings)
3 siblings, 1 reply; 19+ messages in thread
From: Barry Song @ 2025-01-06 3:17 UTC (permalink / raw)
To: akpm, linux-mm
Cc: linux-arm-kernel, x86, linux-kernel, ioworker0, david,
ryan.roberts, zhengtangquan, ying.huang, kasong, chrisl,
baolin.wang, Barry Song
From: Barry Song <v-songbaohua@oppo.com>
The refcount may be temporarily or long-term increased, but this does
not change the fundamental nature of the folio already being lazy-
freed. Therefore, we only reset 'swapbacked' when we are certain the
folio is dirty and not droppable.
Suggested-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Barry Song <v-songbaohua@oppo.com>
---
mm/rmap.c | 49 ++++++++++++++++++++++---------------------------
1 file changed, 22 insertions(+), 27 deletions(-)
diff --git a/mm/rmap.c b/mm/rmap.c
index c6c4d4ea29a7..de6b8c34e98c 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1868,34 +1868,29 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
*/
smp_rmb();
- /*
- * The only page refs must be one from isolation
- * plus the rmap(s) (dropped by discard:).
- */
- if (ref_count == 1 + map_count &&
- (!folio_test_dirty(folio) ||
- /*
- * Unlike MADV_FREE mappings, VM_DROPPABLE
- * ones can be dropped even if they've
- * been dirtied.
- */
- (vma->vm_flags & VM_DROPPABLE))) {
- dec_mm_counter(mm, MM_ANONPAGES);
- goto discard;
- }
-
- /*
- * If the folio was redirtied, it cannot be
- * discarded. Remap the page to page table.
- */
- set_pte_at(mm, address, pvmw.pte, pteval);
- /*
- * Unlike MADV_FREE mappings, VM_DROPPABLE ones
- * never get swap backed on failure to drop.
- */
- if (!(vma->vm_flags & VM_DROPPABLE))
+ if (folio_test_dirty(folio) && !(vma->vm_flags & VM_DROPPABLE)) {
+ /*
+ * redirtied either using the page table or a previously
+ * obtained GUP reference.
+ */
+ set_pte_at(mm, address, pvmw.pte, pteval);
folio_set_swapbacked(folio);
- goto walk_abort;
+ goto walk_abort;
+ } else if (ref_count != 1 + map_count) {
+ /*
+ * Additional reference. Could be a GUP reference or any
+ * speculative reference. GUP users must mark the folio
+ * dirty if there was a modification. This folio cannot be
+ * reclaimed right now either way, so act just like nothing
+ * happened.
+ * We'll come back here later and detect if the folio was
+ * dirtied when the additional reference is gone.
+ */
+ set_pte_at(mm, address, pvmw.pte, pteval);
+ goto walk_abort;
+ }
+ dec_mm_counter(mm, MM_ANONPAGES);
+ goto discard;
}
if (swap_duplicate(entry) < 0) {
--
2.39.3 (Apple Git-146)
^ permalink raw reply [flat|nested] 19+ messages in thread
* [PATCH 2/3] mm: Support tlbbatch flush for a range of PTEs
2025-01-06 3:17 [PATCH 0/3] mm: batched unmap lazyfree large folios during reclamation Barry Song
2025-01-06 3:17 ` [PATCH 1/3] mm: set folio swapbacked iff folios are dirty in try_to_unmap_one Barry Song
@ 2025-01-06 3:17 ` Barry Song
2025-01-06 8:22 ` kernel test robot
2025-01-06 10:07 ` kernel test robot
2025-01-06 3:17 ` [PATCH 3/3] mm: Support batched unmap for lazyfree large folios during reclamation Barry Song
2025-01-06 17:28 ` [PATCH 0/3] mm: batched unmap " Lorenzo Stoakes
3 siblings, 2 replies; 19+ messages in thread
From: Barry Song @ 2025-01-06 3:17 UTC (permalink / raw)
To: akpm, linux-mm
Cc: linux-arm-kernel, x86, linux-kernel, ioworker0, david,
ryan.roberts, zhengtangquan, ying.huang, kasong, chrisl,
baolin.wang, Barry Song, Catalin Marinas, Will Deacon,
Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
H. Peter Anvin, Anshuman Khandual, Shaoqin Huang, Gavin Shan,
Kefeng Wang, Mark Rutland, Kirill A. Shutemov, Yosry Ahmed
From: Barry Song <v-songbaohua@oppo.com>
This is a preparatory patch to support batch PTE unmapping in
`try_to_unmap_one`. It first introduces range handling for
`tlbbatch` flush. Currently, the range is always set to the size of
PAGE_SIZE.
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Shaoqin Huang <shahuang@redhat.com>
Cc: Gavin Shan <gshan@redhat.com>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Lance Yang <ioworker0@gmail.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Yosry Ahmed <yosryahmed@google.com>
Signed-off-by: Barry Song <v-songbaohua@oppo.com>
---
arch/arm64/include/asm/tlbflush.h | 26 ++++++++++++++------------
arch/arm64/mm/contpte.c | 2 +-
arch/x86/include/asm/tlbflush.h | 3 ++-
mm/rmap.c | 12 +++++++-----
4 files changed, 24 insertions(+), 19 deletions(-)
diff --git a/arch/arm64/include/asm/tlbflush.h b/arch/arm64/include/asm/tlbflush.h
index bc94e036a26b..f34e4fab5aa2 100644
--- a/arch/arm64/include/asm/tlbflush.h
+++ b/arch/arm64/include/asm/tlbflush.h
@@ -322,13 +322,6 @@ static inline bool arch_tlbbatch_should_defer(struct mm_struct *mm)
return true;
}
-static inline void arch_tlbbatch_add_pending(struct arch_tlbflush_unmap_batch *batch,
- struct mm_struct *mm,
- unsigned long uaddr)
-{
- __flush_tlb_page_nosync(mm, uaddr);
-}
-
/*
* If mprotect/munmap/etc occurs during TLB batched flushing, we need to
* synchronise all the TLBI issued with a DSB to avoid the race mentioned in
@@ -448,7 +441,7 @@ static inline bool __flush_tlb_range_limit_excess(unsigned long start,
return false;
}
-static inline void __flush_tlb_range_nosync(struct vm_area_struct *vma,
+static inline void __flush_tlb_range_nosync(struct mm_struct *mm,
unsigned long start, unsigned long end,
unsigned long stride, bool last_level,
int tlb_level)
@@ -460,12 +453,12 @@ static inline void __flush_tlb_range_nosync(struct vm_area_struct *vma,
pages = (end - start) >> PAGE_SHIFT;
if (__flush_tlb_range_limit_excess(start, end, pages, stride)) {
- flush_tlb_mm(vma->vm_mm);
+ flush_tlb_mm(mm);
return;
}
dsb(ishst);
- asid = ASID(vma->vm_mm);
+ asid = ASID(mm);
if (last_level)
__flush_tlb_range_op(vale1is, start, pages, stride, asid,
@@ -474,7 +467,7 @@ static inline void __flush_tlb_range_nosync(struct vm_area_struct *vma,
__flush_tlb_range_op(vae1is, start, pages, stride, asid,
tlb_level, true, lpa2_is_enabled());
- mmu_notifier_arch_invalidate_secondary_tlbs(vma->vm_mm, start, end);
+ mmu_notifier_arch_invalidate_secondary_tlbs(mm, start, end);
}
static inline void __flush_tlb_range(struct vm_area_struct *vma,
@@ -482,7 +475,7 @@ static inline void __flush_tlb_range(struct vm_area_struct *vma,
unsigned long stride, bool last_level,
int tlb_level)
{
- __flush_tlb_range_nosync(vma, start, end, stride,
+ __flush_tlb_range_nosync(vma->vm_mm, start, end, stride,
last_level, tlb_level);
dsb(ish);
}
@@ -533,6 +526,15 @@ static inline void __flush_tlb_kernel_pgtable(unsigned long kaddr)
dsb(ish);
isb();
}
+
+static inline void arch_tlbbatch_add_pending(struct arch_tlbflush_unmap_batch *batch,
+ struct mm_struct *mm,
+ unsigned long uaddr,
+ unsigned long size)
+{
+ __flush_tlb_range_nosync(mm, uaddr, uaddr + size,
+ PAGE_SIZE, true, 3);
+}
#endif
#endif
diff --git a/arch/arm64/mm/contpte.c b/arch/arm64/mm/contpte.c
index 55107d27d3f8..bcac4f55f9c1 100644
--- a/arch/arm64/mm/contpte.c
+++ b/arch/arm64/mm/contpte.c
@@ -335,7 +335,7 @@ int contpte_ptep_clear_flush_young(struct vm_area_struct *vma,
* eliding the trailing DSB applies here.
*/
addr = ALIGN_DOWN(addr, CONT_PTE_SIZE);
- __flush_tlb_range_nosync(vma, addr, addr + CONT_PTE_SIZE,
+ __flush_tlb_range_nosync(vma->vm_mm, addr, addr + CONT_PTE_SIZE,
PAGE_SIZE, true, 3);
}
diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h
index 69e79fff41b8..cda35f53f544 100644
--- a/arch/x86/include/asm/tlbflush.h
+++ b/arch/x86/include/asm/tlbflush.h
@@ -279,7 +279,8 @@ static inline u64 inc_mm_tlb_gen(struct mm_struct *mm)
static inline void arch_tlbbatch_add_pending(struct arch_tlbflush_unmap_batch *batch,
struct mm_struct *mm,
- unsigned long uaddr)
+ unsigned long uaddr,
+ unsignd long size)
{
inc_mm_tlb_gen(mm);
cpumask_or(&batch->cpumask, &batch->cpumask, mm_cpumask(mm));
diff --git a/mm/rmap.c b/mm/rmap.c
index de6b8c34e98c..365112af5291 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -672,7 +672,8 @@ void try_to_unmap_flush_dirty(void)
(TLB_FLUSH_BATCH_PENDING_MASK / 2)
static void set_tlb_ubc_flush_pending(struct mm_struct *mm, pte_t pteval,
- unsigned long uaddr)
+ unsigned long uaddr,
+ unsigned long size)
{
struct tlbflush_unmap_batch *tlb_ubc = ¤t->tlb_ubc;
int batch;
@@ -681,7 +682,7 @@ static void set_tlb_ubc_flush_pending(struct mm_struct *mm, pte_t pteval,
if (!pte_accessible(mm, pteval))
return;
- arch_tlbbatch_add_pending(&tlb_ubc->arch, mm, uaddr);
+ arch_tlbbatch_add_pending(&tlb_ubc->arch, mm, uaddr, size);
tlb_ubc->flush_required = true;
/*
@@ -757,7 +758,8 @@ void flush_tlb_batched_pending(struct mm_struct *mm)
}
#else
static void set_tlb_ubc_flush_pending(struct mm_struct *mm, pte_t pteval,
- unsigned long uaddr)
+ unsigned long uaddr,
+ unsigned long size)
{
}
@@ -1792,7 +1794,7 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
*/
pteval = ptep_get_and_clear(mm, address, pvmw.pte);
- set_tlb_ubc_flush_pending(mm, pteval, address);
+ set_tlb_ubc_flush_pending(mm, pteval, address, PAGE_SIZE);
} else {
pteval = ptep_clear_flush(vma, address, pvmw.pte);
}
@@ -2164,7 +2166,7 @@ static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma,
*/
pteval = ptep_get_and_clear(mm, address, pvmw.pte);
- set_tlb_ubc_flush_pending(mm, pteval, address);
+ set_tlb_ubc_flush_pending(mm, pteval, address, PAGE_SIZE);
} else {
pteval = ptep_clear_flush(vma, address, pvmw.pte);
}
--
2.39.3 (Apple Git-146)
^ permalink raw reply [flat|nested] 19+ messages in thread
* [PATCH 3/3] mm: Support batched unmap for lazyfree large folios during reclamation
2025-01-06 3:17 [PATCH 0/3] mm: batched unmap lazyfree large folios during reclamation Barry Song
2025-01-06 3:17 ` [PATCH 1/3] mm: set folio swapbacked iff folios are dirty in try_to_unmap_one Barry Song
2025-01-06 3:17 ` [PATCH 2/3] mm: Support tlbbatch flush for a range of PTEs Barry Song
@ 2025-01-06 3:17 ` Barry Song
2025-01-06 17:28 ` [PATCH 0/3] mm: batched unmap " Lorenzo Stoakes
3 siblings, 0 replies; 19+ messages in thread
From: Barry Song @ 2025-01-06 3:17 UTC (permalink / raw)
To: akpm, linux-mm
Cc: linux-arm-kernel, x86, linux-kernel, ioworker0, david,
ryan.roberts, zhengtangquan, ying.huang, kasong, chrisl,
baolin.wang, Barry Song
From: Barry Song <v-songbaohua@oppo.com>
Currently, the PTEs and rmap of a large folio are removed one at a time.
This is not only slow but also causes the large folio to be unnecessarily
added to deferred_split, which can lead to races between the
deferred_split shrinker callback and memory reclamation. This patch
releases all PTEs and rmap entries in a batch.
Currently, it only handles lazyfree large folios.
The below microbench tries to reclaim 128MB lazyfree large folios
whose sizes are 64KiB:
#include <stdio.h>
#include <sys/mman.h>
#include <string.h>
#include <time.h>
#define SIZE 128*1024*1024 // 128 MB
unsigned long read_split_deferred()
{
FILE *file = fopen("/sys/kernel/mm/transparent_hugepage"
"/hugepages-64kB/stats/split_deferred", "r");
if (!file) {
perror("Error opening file");
return 0;
}
unsigned long value;
if (fscanf(file, "%lu", &value) != 1) {
perror("Error reading value");
fclose(file);
return 0;
}
fclose(file);
return value;
}
int main(int argc, char *argv[])
{
while(1) {
volatile int *p = mmap(0, SIZE, PROT_READ | PROT_WRITE,
MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
memset((void *)p, 1, SIZE);
madvise((void *)p, SIZE, MADV_FREE);
clock_t start_time = clock();
unsigned long start_split = read_split_deferred();
madvise((void *)p, SIZE, MADV_PAGEOUT);
clock_t end_time = clock();
unsigned long end_split = read_split_deferred();
double elapsed_time = (double)(end_time - start_time) / CLOCKS_PER_SEC;
printf("Time taken by reclamation: %f seconds, split_deferred: %ld\n",
elapsed_time, end_split - start_split);
munmap((void *)p, SIZE);
}
return 0;
}
w/o patch:
~ # ./a.out
Time taken by reclamation: 0.177418 seconds, split_deferred: 2048
Time taken by reclamation: 0.178348 seconds, split_deferred: 2048
Time taken by reclamation: 0.174525 seconds, split_deferred: 2048
Time taken by reclamation: 0.171620 seconds, split_deferred: 2048
Time taken by reclamation: 0.172241 seconds, split_deferred: 2048
Time taken by reclamation: 0.174003 seconds, split_deferred: 2048
Time taken by reclamation: 0.171058 seconds, split_deferred: 2048
Time taken by reclamation: 0.171993 seconds, split_deferred: 2048
Time taken by reclamation: 0.169829 seconds, split_deferred: 2048
Time taken by reclamation: 0.172895 seconds, split_deferred: 2048
Time taken by reclamation: 0.176063 seconds, split_deferred: 2048
Time taken by reclamation: 0.172568 seconds, split_deferred: 2048
Time taken by reclamation: 0.171185 seconds, split_deferred: 2048
Time taken by reclamation: 0.170632 seconds, split_deferred: 2048
Time taken by reclamation: 0.170208 seconds, split_deferred: 2048
Time taken by reclamation: 0.174192 seconds, split_deferred: 2048
...
w/ patch:
~ # ./a.out
Time taken by reclamation: 0.074231 seconds, split_deferred: 0
Time taken by reclamation: 0.071026 seconds, split_deferred: 0
Time taken by reclamation: 0.072029 seconds, split_deferred: 0
Time taken by reclamation: 0.071873 seconds, split_deferred: 0
Time taken by reclamation: 0.073573 seconds, split_deferred: 0
Time taken by reclamation: 0.071906 seconds, split_deferred: 0
Time taken by reclamation: 0.073604 seconds, split_deferred: 0
Time taken by reclamation: 0.075903 seconds, split_deferred: 0
Time taken by reclamation: 0.073191 seconds, split_deferred: 0
Time taken by reclamation: 0.071228 seconds, split_deferred: 0
Time taken by reclamation: 0.071391 seconds, split_deferred: 0
Time taken by reclamation: 0.071468 seconds, split_deferred: 0
Time taken by reclamation: 0.071896 seconds, split_deferred: 0
Time taken by reclamation: 0.072508 seconds, split_deferred: 0
Time taken by reclamation: 0.071884 seconds, split_deferred: 0
Time taken by reclamation: 0.072433 seconds, split_deferred: 0
Time taken by reclamation: 0.071939 seconds, split_deferred: 0
...
Signed-off-by: Barry Song <v-songbaohua@oppo.com>
---
mm/rmap.c | 48 ++++++++++++++++++++++++++++++++++++++++++------
1 file changed, 42 insertions(+), 6 deletions(-)
diff --git a/mm/rmap.c b/mm/rmap.c
index 365112af5291..9424b96f8482 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1642,6 +1642,27 @@ void folio_remove_rmap_pmd(struct folio *folio, struct page *page,
#endif
}
+/* We support batch unmapping of PTEs for lazyfree large folios */
+static inline bool can_batch_unmap_folio_ptes(unsigned long addr,
+ struct folio *folio, pte_t *ptep)
+{
+ const fpb_t fpb_flags = FPB_IGNORE_DIRTY | FPB_IGNORE_SOFT_DIRTY;
+ int max_nr = folio_nr_pages(folio);
+ pte_t pte = ptep_get(ptep);
+
+ if (pte_none(pte))
+ return false;
+ if (!pte_present(pte))
+ return false;
+ if (!folio_test_anon(folio))
+ return false;
+ if (folio_test_swapbacked(folio))
+ return false;
+
+ return folio_pte_batch(folio, addr, ptep, pte, max_nr, fpb_flags, NULL,
+ NULL, NULL) == max_nr;
+}
+
/*
* @arg: enum ttu_flags will be passed to this argument
*/
@@ -1655,6 +1676,7 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
bool anon_exclusive, ret = true;
struct mmu_notifier_range range;
enum ttu_flags flags = (enum ttu_flags)(long)arg;
+ int nr_pages = 1;
unsigned long pfn;
unsigned long hsz = 0;
@@ -1780,6 +1802,15 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
hugetlb_vma_unlock_write(vma);
}
pteval = huge_ptep_clear_flush(vma, address, pvmw.pte);
+ } else if (folio_test_large(folio) &&
+ can_batch_unmap_folio_ptes(address, folio, pvmw.pte)) {
+ nr_pages = folio_nr_pages(folio);
+ flush_cache_range(vma, range.start, range.end);
+ pteval = get_and_clear_full_ptes(mm, address, pvmw.pte, nr_pages, 0);
+ if (should_defer_flush(mm, flags))
+ set_tlb_ubc_flush_pending(mm, pteval, address, folio_size(folio));
+ else
+ flush_tlb_range(vma, range.start, range.end);
} else {
flush_cache_page(vma, address, pfn);
/* Nuke the page table entry. */
@@ -1875,7 +1906,7 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
* redirtied either using the page table or a previously
* obtained GUP reference.
*/
- set_pte_at(mm, address, pvmw.pte, pteval);
+ set_ptes(mm, address, pvmw.pte, pteval, nr_pages);
folio_set_swapbacked(folio);
goto walk_abort;
} else if (ref_count != 1 + map_count) {
@@ -1888,10 +1919,10 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
* We'll come back here later and detect if the folio was
* dirtied when the additional reference is gone.
*/
- set_pte_at(mm, address, pvmw.pte, pteval);
+ set_ptes(mm, address, pvmw.pte, pteval, nr_pages);
goto walk_abort;
}
- dec_mm_counter(mm, MM_ANONPAGES);
+ add_mm_counter(mm, MM_ANONPAGES, -nr_pages);
goto discard;
}
@@ -1943,13 +1974,18 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
dec_mm_counter(mm, mm_counter_file(folio));
}
discard:
- if (unlikely(folio_test_hugetlb(folio)))
+ if (unlikely(folio_test_hugetlb(folio))) {
hugetlb_remove_rmap(folio);
- else
- folio_remove_rmap_pte(folio, subpage, vma);
+ } else {
+ folio_remove_rmap_ptes(folio, subpage, nr_pages, vma);
+ folio_ref_sub(folio, nr_pages - 1);
+ }
if (vma->vm_flags & VM_LOCKED)
mlock_drain_local();
folio_put(folio);
+ /* We have already batched the entire folio */
+ if (nr_pages > 1)
+ goto walk_done;
continue;
walk_abort:
ret = false;
--
2.39.3 (Apple Git-146)
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 1/3] mm: set folio swapbacked iff folios are dirty in try_to_unmap_one
2025-01-06 3:17 ` [PATCH 1/3] mm: set folio swapbacked iff folios are dirty in try_to_unmap_one Barry Song
@ 2025-01-06 6:40 ` Baolin Wang
2025-01-06 9:03 ` Barry Song
0 siblings, 1 reply; 19+ messages in thread
From: Baolin Wang @ 2025-01-06 6:40 UTC (permalink / raw)
To: Barry Song, akpm, linux-mm
Cc: linux-arm-kernel, x86, linux-kernel, ioworker0, david,
ryan.roberts, zhengtangquan, ying.huang, kasong, chrisl,
Barry Song
On 2025/1/6 11:17, Barry Song wrote:
> From: Barry Song <v-songbaohua@oppo.com>
>
> The refcount may be temporarily or long-term increased, but this does
> not change the fundamental nature of the folio already being lazy-
> freed. Therefore, we only reset 'swapbacked' when we are certain the
> folio is dirty and not droppable.
>
> Suggested-by: David Hildenbrand <david@redhat.com>
> Signed-off-by: Barry Song <v-songbaohua@oppo.com>
The changes look good to me. While we are at it, could you also change
the __discard_anon_folio_pmd_locked() to follow the same strategy for
lazy-freed PMD-sized folio?
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 2/3] mm: Support tlbbatch flush for a range of PTEs
2025-01-06 3:17 ` [PATCH 2/3] mm: Support tlbbatch flush for a range of PTEs Barry Song
@ 2025-01-06 8:22 ` kernel test robot
2025-01-13 0:55 ` Barry Song
2025-01-06 10:07 ` kernel test robot
1 sibling, 1 reply; 19+ messages in thread
From: kernel test robot @ 2025-01-06 8:22 UTC (permalink / raw)
To: Barry Song, akpm, linux-mm
Cc: oe-kbuild-all, linux-arm-kernel, x86, linux-kernel, ioworker0,
david, ryan.roberts, zhengtangquan, ying.huang, kasong, chrisl,
baolin.wang, Barry Song, Catalin Marinas, Will Deacon,
Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
H. Peter Anvin, Anshuman Khandual, Shaoqin Huang, Gavin Shan,
Kefeng Wang, Mark Rutland, Kirill A. Shutemov, Yosry Ahmed
Hi Barry,
kernel test robot noticed the following build errors:
[auto build test ERROR on akpm-mm/mm-everything]
url: https://github.com/intel-lab-lkp/linux/commits/Barry-Song/mm-set-folio-swapbacked-iff-folios-are-dirty-in-try_to_unmap_one/20250106-112638
base: https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-everything
patch link: https://lore.kernel.org/r/20250106031711.82855-3-21cnbao%40gmail.com
patch subject: [PATCH 2/3] mm: Support tlbbatch flush for a range of PTEs
config: i386-buildonly-randconfig-002-20250106 (https://download.01.org/0day-ci/archive/20250106/202501061535.zx9E486H-lkp@intel.com/config)
compiler: gcc-12 (Debian 12.2.0-14) 12.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250106/202501061535.zx9E486H-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202501061535.zx9E486H-lkp@intel.com/
All errors (new ones prefixed by >>):
In file included from arch/x86/include/asm/uaccess.h:17,
from include/linux/uaccess.h:12,
from include/linux/sched/task.h:13,
from include/linux/sched/signal.h:9,
from include/linux/rcuwait.h:6,
from include/linux/percpu-rwsem.h:7,
from include/linux/fs.h:33,
from include/linux/cgroup.h:17,
from include/linux/memcontrol.h:13,
from include/linux/swap.h:9,
from include/linux/suspend.h:5,
from arch/x86/kernel/asm-offsets.c:14:
>> arch/x86/include/asm/tlbflush.h:283:46: error: unknown type name 'unsignd'; did you mean 'unsigned'?
283 | unsignd long size)
| ^~~~~~~
| unsigned
make[3]: *** [scripts/Makefile.build:102: arch/x86/kernel/asm-offsets.s] Error 1 shuffle=998720002
make[3]: Target 'prepare' not remade because of errors.
make[2]: *** [Makefile:1263: prepare0] Error 2 shuffle=998720002
make[2]: Target 'prepare' not remade because of errors.
make[1]: *** [Makefile:251: __sub-make] Error 2 shuffle=998720002
make[1]: Target 'prepare' not remade because of errors.
make: *** [Makefile:251: __sub-make] Error 2 shuffle=998720002
make: Target 'prepare' not remade because of errors.
vim +283 arch/x86/include/asm/tlbflush.h
279
280 static inline void arch_tlbbatch_add_pending(struct arch_tlbflush_unmap_batch *batch,
281 struct mm_struct *mm,
282 unsigned long uaddr,
> 283 unsignd long size)
284 {
285 inc_mm_tlb_gen(mm);
286 cpumask_or(&batch->cpumask, &batch->cpumask, mm_cpumask(mm));
287 mmu_notifier_arch_invalidate_secondary_tlbs(mm, 0, -1UL);
288 }
289
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 1/3] mm: set folio swapbacked iff folios are dirty in try_to_unmap_one
2025-01-06 6:40 ` Baolin Wang
@ 2025-01-06 9:03 ` Barry Song
2025-01-06 9:34 ` Baolin Wang
0 siblings, 1 reply; 19+ messages in thread
From: Barry Song @ 2025-01-06 9:03 UTC (permalink / raw)
To: Baolin Wang
Cc: akpm, linux-mm, linux-arm-kernel, x86, linux-kernel, ioworker0,
david, ryan.roberts, zhengtangquan, ying.huang, kasong, chrisl,
Barry Song
On Mon, Jan 6, 2025 at 7:40 PM Baolin Wang
<baolin.wang@linux.alibaba.com> wrote:
>
>
>
> On 2025/1/6 11:17, Barry Song wrote:
> > From: Barry Song <v-songbaohua@oppo.com>
> >
> > The refcount may be temporarily or long-term increased, but this does
> > not change the fundamental nature of the folio already being lazy-
> > freed. Therefore, we only reset 'swapbacked' when we are certain the
> > folio is dirty and not droppable.
> >
> > Suggested-by: David Hildenbrand <david@redhat.com>
> > Signed-off-by: Barry Song <v-songbaohua@oppo.com>
>
> The changes look good to me. While we are at it, could you also change
> the __discard_anon_folio_pmd_locked() to follow the same strategy for
> lazy-freed PMD-sized folio?
it seems you mean __discard_anon_folio_pmd_locked() is lacking
folio_set_swapbacked(folio) for dirty pmd-mapped folios?
and it seems !(vma->vm_flags & VM_DROPPABLE) is also not
handled properly?
Thanks
barry
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 1/3] mm: set folio swapbacked iff folios are dirty in try_to_unmap_one
2025-01-06 9:03 ` Barry Song
@ 2025-01-06 9:34 ` Baolin Wang
2025-01-06 14:39 ` Lance Yang
0 siblings, 1 reply; 19+ messages in thread
From: Baolin Wang @ 2025-01-06 9:34 UTC (permalink / raw)
To: Barry Song
Cc: akpm, linux-mm, linux-arm-kernel, x86, linux-kernel, ioworker0,
david, ryan.roberts, zhengtangquan, ying.huang, kasong, chrisl,
Barry Song
On 2025/1/6 17:03, Barry Song wrote:
> On Mon, Jan 6, 2025 at 7:40 PM Baolin Wang
> <baolin.wang@linux.alibaba.com> wrote:
>>
>>
>>
>> On 2025/1/6 11:17, Barry Song wrote:
>>> From: Barry Song <v-songbaohua@oppo.com>
>>>
>>> The refcount may be temporarily or long-term increased, but this does
>>> not change the fundamental nature of the folio already being lazy-
>>> freed. Therefore, we only reset 'swapbacked' when we are certain the
>>> folio is dirty and not droppable.
>>>
>>> Suggested-by: David Hildenbrand <david@redhat.com>
>>> Signed-off-by: Barry Song <v-songbaohua@oppo.com>
>>
>> The changes look good to me. While we are at it, could you also change
>> the __discard_anon_folio_pmd_locked() to follow the same strategy for
>> lazy-freed PMD-sized folio?
>
> it seems you mean __discard_anon_folio_pmd_locked() is lacking
> folio_set_swapbacked(folio) for dirty pmd-mapped folios?
> and it seems !(vma->vm_flags & VM_DROPPABLE) is also not
> handled properly?
Right.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 2/3] mm: Support tlbbatch flush for a range of PTEs
2025-01-06 3:17 ` [PATCH 2/3] mm: Support tlbbatch flush for a range of PTEs Barry Song
2025-01-06 8:22 ` kernel test robot
@ 2025-01-06 10:07 ` kernel test robot
2025-01-13 0:56 ` Barry Song
1 sibling, 1 reply; 19+ messages in thread
From: kernel test robot @ 2025-01-06 10:07 UTC (permalink / raw)
To: Barry Song, akpm, linux-mm
Cc: oe-kbuild-all, linux-arm-kernel, x86, linux-kernel, ioworker0,
david, ryan.roberts, zhengtangquan, ying.huang, kasong, chrisl,
baolin.wang, Barry Song, Catalin Marinas, Will Deacon,
Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
H. Peter Anvin, Anshuman Khandual, Shaoqin Huang, Gavin Shan,
Kefeng Wang, Mark Rutland, Kirill A. Shutemov, Yosry Ahmed
Hi Barry,
kernel test robot noticed the following build errors:
[auto build test ERROR on akpm-mm/mm-everything]
url: https://github.com/intel-lab-lkp/linux/commits/Barry-Song/mm-set-folio-swapbacked-iff-folios-are-dirty-in-try_to_unmap_one/20250106-112638
base: https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-everything
patch link: https://lore.kernel.org/r/20250106031711.82855-3-21cnbao%40gmail.com
patch subject: [PATCH 2/3] mm: Support tlbbatch flush for a range of PTEs
config: riscv-randconfig-001-20250106 (https://download.01.org/0day-ci/archive/20250106/202501061736.FoHcInHJ-lkp@intel.com/config)
compiler: riscv64-linux-gcc (GCC) 14.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250106/202501061736.FoHcInHJ-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202501061736.FoHcInHJ-lkp@intel.com/
All errors (new ones prefixed by >>):
mm/rmap.c: In function 'set_tlb_ubc_flush_pending':
>> mm/rmap.c:685:9: error: too many arguments to function 'arch_tlbbatch_add_pending'
685 | arch_tlbbatch_add_pending(&tlb_ubc->arch, mm, uaddr, size);
| ^~~~~~~~~~~~~~~~~~~~~~~~~
In file included from arch/riscv/include/asm/pgtable.h:113,
from include/linux/pgtable.h:6,
from include/linux/mm.h:30,
from mm/rmap.c:55:
arch/riscv/include/asm/tlbflush.h:62:6: note: declared here
62 | void arch_tlbbatch_add_pending(struct arch_tlbflush_unmap_batch *batch,
| ^~~~~~~~~~~~~~~~~~~~~~~~~
vim +/arch_tlbbatch_add_pending +685 mm/rmap.c
663
664 /*
665 * Bits 0-14 of mm->tlb_flush_batched record pending generations.
666 * Bits 16-30 of mm->tlb_flush_batched bit record flushed generations.
667 */
668 #define TLB_FLUSH_BATCH_FLUSHED_SHIFT 16
669 #define TLB_FLUSH_BATCH_PENDING_MASK \
670 ((1 << (TLB_FLUSH_BATCH_FLUSHED_SHIFT - 1)) - 1)
671 #define TLB_FLUSH_BATCH_PENDING_LARGE \
672 (TLB_FLUSH_BATCH_PENDING_MASK / 2)
673
674 static void set_tlb_ubc_flush_pending(struct mm_struct *mm, pte_t pteval,
675 unsigned long uaddr,
676 unsigned long size)
677 {
678 struct tlbflush_unmap_batch *tlb_ubc = ¤t->tlb_ubc;
679 int batch;
680 bool writable = pte_dirty(pteval);
681
682 if (!pte_accessible(mm, pteval))
683 return;
684
> 685 arch_tlbbatch_add_pending(&tlb_ubc->arch, mm, uaddr, size);
686 tlb_ubc->flush_required = true;
687
688 /*
689 * Ensure compiler does not re-order the setting of tlb_flush_batched
690 * before the PTE is cleared.
691 */
692 barrier();
693 batch = atomic_read(&mm->tlb_flush_batched);
694 retry:
695 if ((batch & TLB_FLUSH_BATCH_PENDING_MASK) > TLB_FLUSH_BATCH_PENDING_LARGE) {
696 /*
697 * Prevent `pending' from catching up with `flushed' because of
698 * overflow. Reset `pending' and `flushed' to be 1 and 0 if
699 * `pending' becomes large.
700 */
701 if (!atomic_try_cmpxchg(&mm->tlb_flush_batched, &batch, 1))
702 goto retry;
703 } else {
704 atomic_inc(&mm->tlb_flush_batched);
705 }
706
707 /*
708 * If the PTE was dirty then it's best to assume it's writable. The
709 * caller must use try_to_unmap_flush_dirty() or try_to_unmap_flush()
710 * before the page is queued for IO.
711 */
712 if (writable)
713 tlb_ubc->writable = true;
714 }
715
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 1/3] mm: set folio swapbacked iff folios are dirty in try_to_unmap_one
2025-01-06 9:34 ` Baolin Wang
@ 2025-01-06 14:39 ` Lance Yang
2025-01-06 20:52 ` Barry Song
2025-01-07 1:33 ` Lance Yang
0 siblings, 2 replies; 19+ messages in thread
From: Lance Yang @ 2025-01-06 14:39 UTC (permalink / raw)
To: Baolin Wang
Cc: Barry Song, akpm, linux-mm, linux-arm-kernel, x86, linux-kernel,
david, ryan.roberts, zhengtangquan, ying.huang, kasong, chrisl,
Barry Song
On Mon, Jan 6, 2025 at 5:34 PM Baolin Wang
<baolin.wang@linux.alibaba.com> wrote:
>
>
>
> On 2025/1/6 17:03, Barry Song wrote:
> > On Mon, Jan 6, 2025 at 7:40 PM Baolin Wang
> > <baolin.wang@linux.alibaba.com> wrote:
> >>
> >>
> >>
> >> On 2025/1/6 11:17, Barry Song wrote:
> >>> From: Barry Song <v-songbaohua@oppo.com>
> >>>
> >>> The refcount may be temporarily or long-term increased, but this does
> >>> not change the fundamental nature of the folio already being lazy-
> >>> freed. Therefore, we only reset 'swapbacked' when we are certain the
> >>> folio is dirty and not droppable.
> >>>
> >>> Suggested-by: David Hildenbrand <david@redhat.com>
> >>> Signed-off-by: Barry Song <v-songbaohua@oppo.com>
> >>
> >> The changes look good to me. While we are at it, could you also change
> >> the __discard_anon_folio_pmd_locked() to follow the same strategy for
> >> lazy-freed PMD-sized folio?
> >
> > it seems you mean __discard_anon_folio_pmd_locked() is lacking
> > folio_set_swapbacked(folio) for dirty pmd-mapped folios?
Good catch!
Hmm... I don't recall why we don't call folio_set_swapbacked for dirty
THPs in __discard_anon_folio_pmd_locked() - possibly to align with
previous behavior ;)
If a dirty PMD-mapped THP cannot be discarded, we just split it and
restart the page walk to process the PTE-mapped THP. After that, we
will only mark each folio within the THP as swap-backed individually.
It seems like we could cut the work by calling folio_set_swapbacked()
for dirty THPs directly in __discard_anon_folio_pmd_locked(), skipping
the restart of the page walk after splitting the THP, IMHO ;)
Thanks,
Lance
> > and it seems !(vma->vm_flags & VM_DROPPABLE) is also not
> > handled properly?
>
> Right.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 0/3] mm: batched unmap lazyfree large folios during reclamation
2025-01-06 3:17 [PATCH 0/3] mm: batched unmap lazyfree large folios during reclamation Barry Song
` (2 preceding siblings ...)
2025-01-06 3:17 ` [PATCH 3/3] mm: Support batched unmap for lazyfree large folios during reclamation Barry Song
@ 2025-01-06 17:28 ` Lorenzo Stoakes
2025-01-06 19:15 ` Barry Song
3 siblings, 1 reply; 19+ messages in thread
From: Lorenzo Stoakes @ 2025-01-06 17:28 UTC (permalink / raw)
To: Barry Song
Cc: akpm, linux-mm, linux-arm-kernel, x86, linux-kernel, ioworker0,
david, ryan.roberts, zhengtangquan, ying.huang, kasong, chrisl,
baolin.wang, Barry Song
On Mon, Jan 06, 2025 at 04:17:08PM +1300, Barry Song wrote:
> From: Barry Song <v-songbaohua@oppo.com>
>
> Commit 735ecdfaf4e80 ("mm/vmscan: avoid splitting lazyfree THP during
> shrink_folio_list()") prevents the splitting of MADV_FREE'd THP in madvise.c.
> However, those folios are still added to the deferred_split list in
> try_to_unmap_one() because we are unmapping PTEs and removing rmap entries
> one by one. This approach is not only slow but also increases the risk of a
> race condition where lazyfree folios are incorrectly set back to swapbacked,
> as a speculative folio_get may occur in the shrinker's callback.
>
> This patchset addresses the issue by only marking truly dirty folios as
> swapbacked as suggested by David and shifting to batched unmapping of the
> entire folio in try_to_unmap_one(). As a result, we've observed
> deferred_split dropping to zero and significant performance improvements
> in memory reclamation.
You've not provided any numbers? What performance improvements? Under what
workloads?
You're adding a bunch of complexity here, so I feel like we need to see
some numbers, background, etc.?
Thanks!
>
> Barry Song (3):
> mm: set folio swapbacked iff folios are dirty in try_to_unmap_one
> mm: Support tlbbatch flush for a range of PTEs
> mm: Support batched unmap for lazyfree large folios during reclamation
>
> arch/arm64/include/asm/tlbflush.h | 26 ++++----
> arch/arm64/mm/contpte.c | 2 +-
> arch/x86/include/asm/tlbflush.h | 3 +-
> mm/rmap.c | 103 ++++++++++++++++++++----------
> 4 files changed, 85 insertions(+), 49 deletions(-)
>
> --
> 2.39.3 (Apple Git-146)
>
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 0/3] mm: batched unmap lazyfree large folios during reclamation
2025-01-06 17:28 ` [PATCH 0/3] mm: batched unmap " Lorenzo Stoakes
@ 2025-01-06 19:15 ` Barry Song
0 siblings, 0 replies; 19+ messages in thread
From: Barry Song @ 2025-01-06 19:15 UTC (permalink / raw)
To: Lorenzo Stoakes
Cc: akpm, linux-mm, linux-arm-kernel, x86, linux-kernel, ioworker0,
david, ryan.roberts, zhengtangquan, ying.huang, kasong, chrisl,
baolin.wang, Barry Song
On Tue, Jan 7, 2025 at 6:28 AM Lorenzo Stoakes
<lorenzo.stoakes@oracle.com> wrote:
>
> On Mon, Jan 06, 2025 at 04:17:08PM +1300, Barry Song wrote:
> > From: Barry Song <v-songbaohua@oppo.com>
> >
> > Commit 735ecdfaf4e80 ("mm/vmscan: avoid splitting lazyfree THP during
> > shrink_folio_list()") prevents the splitting of MADV_FREE'd THP in madvise.c.
> > However, those folios are still added to the deferred_split list in
> > try_to_unmap_one() because we are unmapping PTEs and removing rmap entries
> > one by one. This approach is not only slow but also increases the risk of a
> > race condition where lazyfree folios are incorrectly set back to swapbacked,
> > as a speculative folio_get may occur in the shrinker's callback.
> >
> > This patchset addresses the issue by only marking truly dirty folios as
> > swapbacked as suggested by David and shifting to batched unmapping of the
> > entire folio in try_to_unmap_one(). As a result, we've observed
> > deferred_split dropping to zero and significant performance improvements
> > in memory reclamation.
>
> You've not provided any numbers? What performance improvements? Under what
> workloads?
The number can be found in patch 3/3 at the following link:
https://lore.kernel.org/linux-mm/20250106031711.82855-4-21cnbao@gmail.com/
Reclaiming lazyfree mTHP will now be significantly faster.
Additionally, this patch
addresses the issue with the misleading split_deferred counter. The
split_deferred
counter was intended to track operations like unaligned unmap/madvise, but in
practice, the majority of split_deferred cases result from memory reclamation
of aligned lazyfree mTHP. This discrepancy rendered the split_deferred
counter highly
misleading.
>
> You're adding a bunch of complexity here, so I feel like we need to see
> some numbers, background, etc.?
I agree that I can provide more details in v2. In the meantime, you can
find additional background information here:
https://lore.kernel.org/linux-mm/CAGsJ_4wOL6TLa3FKQASdrGfuqqu=14EuxAtpKmnebiGLm0dnfA@mail.gmail.com/
>
> Thanks!
>
> >
> > Barry Song (3):
> > mm: set folio swapbacked iff folios are dirty in try_to_unmap_one
> > mm: Support tlbbatch flush for a range of PTEs
> > mm: Support batched unmap for lazyfree large folios during reclamation
> >
> > arch/arm64/include/asm/tlbflush.h | 26 ++++----
> > arch/arm64/mm/contpte.c | 2 +-
> > arch/x86/include/asm/tlbflush.h | 3 +-
> > mm/rmap.c | 103 ++++++++++++++++++++----------
> > 4 files changed, 85 insertions(+), 49 deletions(-)
> >
> > --
> > 2.39.3 (Apple Git-146)
Thanks
Barry
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 1/3] mm: set folio swapbacked iff folios are dirty in try_to_unmap_one
2025-01-06 14:39 ` Lance Yang
@ 2025-01-06 20:52 ` Barry Song
2025-01-06 20:56 ` Barry Song
2025-01-07 1:33 ` Lance Yang
1 sibling, 1 reply; 19+ messages in thread
From: Barry Song @ 2025-01-06 20:52 UTC (permalink / raw)
To: Lance Yang
Cc: Baolin Wang, akpm, linux-mm, linux-arm-kernel, x86, linux-kernel,
david, ryan.roberts, zhengtangquan, ying.huang, kasong, chrisl,
Barry Song
On Tue, Jan 7, 2025 at 3:40 AM Lance Yang <ioworker0@gmail.com> wrote:
>
> On Mon, Jan 6, 2025 at 5:34 PM Baolin Wang
> <baolin.wang@linux.alibaba.com> wrote:
> >
> >
> >
> > On 2025/1/6 17:03, Barry Song wrote:
> > > On Mon, Jan 6, 2025 at 7:40 PM Baolin Wang
> > > <baolin.wang@linux.alibaba.com> wrote:
> > >>
> > >>
> > >>
> > >> On 2025/1/6 11:17, Barry Song wrote:
> > >>> From: Barry Song <v-songbaohua@oppo.com>
> > >>>
> > >>> The refcount may be temporarily or long-term increased, but this does
> > >>> not change the fundamental nature of the folio already being lazy-
> > >>> freed. Therefore, we only reset 'swapbacked' when we are certain the
> > >>> folio is dirty and not droppable.
> > >>>
> > >>> Suggested-by: David Hildenbrand <david@redhat.com>
> > >>> Signed-off-by: Barry Song <v-songbaohua@oppo.com>
> > >>
> > >> The changes look good to me. While we are at it, could you also change
> > >> the __discard_anon_folio_pmd_locked() to follow the same strategy for
> > >> lazy-freed PMD-sized folio?
> > >
> > > it seems you mean __discard_anon_folio_pmd_locked() is lacking
> > > folio_set_swapbacked(folio) for dirty pmd-mapped folios?
>
> Good catch!
>
> Hmm... I don't recall why we don't call folio_set_swapbacked for dirty
> THPs in __discard_anon_folio_pmd_locked() - possibly to align with
> previous behavior ;)
>
> If a dirty PMD-mapped THP cannot be discarded, we just split it and
> restart the page walk to process the PTE-mapped THP. After that, we
> will only mark each folio within the THP as swap-backed individually.
>
> It seems like we could cut the work by calling folio_set_swapbacked()
> for dirty THPs directly in __discard_anon_folio_pmd_locked(), skipping
> the restart of the page walk after splitting the THP, IMHO ;)
Yes, the existing code for PMD-mapped THPs seems quite inefficient. It splits
the PMD-mapped THP into smaller folios, marks each split PTE as dirty, and
then iterates over each PTE. I’m not sure why it’s designed this way—could
there be a specific reason behind this approach?
However, it does appear to handle folio_set_swapbacked() correctly, as only
a dirty PMD will result in dirty PTEs being generated in
__split_huge_pmd_locked():
} else {
pte_t entry;
entry = mk_pte(page, READ_ONCE(vma->vm_page_prot));
if (write)
entry = pte_mkwrite(entry, vma);
if (!young)
entry = pte_mkold(entry);
/* NOTE: this may set soft-dirty too on some archs */
if (dirty)
entry = pte_mkdirty(entry);
if (soft_dirty)
entry = pte_mksoft_dirty(entry);
if (uffd_wp)
entry = pte_mkuffd_wp(entry);
for (i = 0; i < HPAGE_PMD_NR; i++)
VM_WARN_ON(!pte_none(ptep_get(pte + i)));
set_ptes(mm, haddr, pte, entry, HPAGE_PMD_NR);
}
>
> Thanks,
> Lance
>
>
> > > and it seems !(vma->vm_flags & VM_DROPPABLE) is also not
> > > handled properly?
>
>
> >
> > Right.
Thanks
Barry
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 1/3] mm: set folio swapbacked iff folios are dirty in try_to_unmap_one
2025-01-06 20:52 ` Barry Song
@ 2025-01-06 20:56 ` Barry Song
0 siblings, 0 replies; 19+ messages in thread
From: Barry Song @ 2025-01-06 20:56 UTC (permalink / raw)
To: Lance Yang
Cc: Baolin Wang, akpm, linux-mm, linux-arm-kernel, x86, linux-kernel,
david, ryan.roberts, zhengtangquan, kasong, chrisl, Barry Song,
Ying Huang
On Tue, Jan 7, 2025 at 9:52 AM Barry Song <21cnbao@gmail.com> wrote:
>
> On Tue, Jan 7, 2025 at 3:40 AM Lance Yang <ioworker0@gmail.com> wrote:
> >
> > On Mon, Jan 6, 2025 at 5:34 PM Baolin Wang
> > <baolin.wang@linux.alibaba.com> wrote:
> > >
> > >
> > >
> > > On 2025/1/6 17:03, Barry Song wrote:
> > > > On Mon, Jan 6, 2025 at 7:40 PM Baolin Wang
> > > > <baolin.wang@linux.alibaba.com> wrote:
> > > >>
> > > >>
> > > >>
> > > >> On 2025/1/6 11:17, Barry Song wrote:
> > > >>> From: Barry Song <v-songbaohua@oppo.com>
> > > >>>
> > > >>> The refcount may be temporarily or long-term increased, but this does
> > > >>> not change the fundamental nature of the folio already being lazy-
> > > >>> freed. Therefore, we only reset 'swapbacked' when we are certain the
> > > >>> folio is dirty and not droppable.
> > > >>>
> > > >>> Suggested-by: David Hildenbrand <david@redhat.com>
> > > >>> Signed-off-by: Barry Song <v-songbaohua@oppo.com>
> > > >>
> > > >> The changes look good to me. While we are at it, could you also change
> > > >> the __discard_anon_folio_pmd_locked() to follow the same strategy for
> > > >> lazy-freed PMD-sized folio?
> > > >
> > > > it seems you mean __discard_anon_folio_pmd_locked() is lacking
> > > > folio_set_swapbacked(folio) for dirty pmd-mapped folios?
> >
> > Good catch!
> >
> > Hmm... I don't recall why we don't call folio_set_swapbacked for dirty
> > THPs in __discard_anon_folio_pmd_locked() - possibly to align with
> > previous behavior ;)
> >
> > If a dirty PMD-mapped THP cannot be discarded, we just split it and
> > restart the page walk to process the PTE-mapped THP. After that, we
> > will only mark each folio within the THP as swap-backed individually.
> >
> > It seems like we could cut the work by calling folio_set_swapbacked()
> > for dirty THPs directly in __discard_anon_folio_pmd_locked(), skipping
> > the restart of the page walk after splitting the THP, IMHO ;)
>
> Yes, the existing code for PMD-mapped THPs seems quite inefficient. It splits
> the PMD-mapped THP into smaller folios, marks each split PTE as dirty, and
Apologies for the typo, I meant splitting a PMD-mapped THP into a PTE-mapped
THP.
> then iterates over each PTE. I’m not sure why it’s designed this way—could
> there be a specific reason behind this approach?
>
> However, it does appear to handle folio_set_swapbacked() correctly, as only
> a dirty PMD will result in dirty PTEs being generated in
> __split_huge_pmd_locked():
>
> } else {
> pte_t entry;
>
> entry = mk_pte(page, READ_ONCE(vma->vm_page_prot));
> if (write)
> entry = pte_mkwrite(entry, vma);
>
> if (!young)
> entry = pte_mkold(entry);
>
> /* NOTE: this may set soft-dirty too on some archs */
> if (dirty)
> entry = pte_mkdirty(entry);
>
> if (soft_dirty)
> entry = pte_mksoft_dirty(entry);
>
> if (uffd_wp)
> entry = pte_mkuffd_wp(entry);
>
> for (i = 0; i < HPAGE_PMD_NR; i++)
> VM_WARN_ON(!pte_none(ptep_get(pte + i)));
>
> set_ptes(mm, haddr, pte, entry, HPAGE_PMD_NR);
> }
>
>
>
> >
> > Thanks,
> > Lance
> >
> >
> > > > and it seems !(vma->vm_flags & VM_DROPPABLE) is also not
> > > > handled properly?
> >
> >
> > >
> > > Right.
>
> Thanks
> Barry
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 1/3] mm: set folio swapbacked iff folios are dirty in try_to_unmap_one
2025-01-06 14:39 ` Lance Yang
2025-01-06 20:52 ` Barry Song
@ 2025-01-07 1:33 ` Lance Yang
1 sibling, 0 replies; 19+ messages in thread
From: Lance Yang @ 2025-01-07 1:33 UTC (permalink / raw)
To: Baolin Wang
Cc: Barry Song, akpm, linux-mm, linux-arm-kernel, x86, linux-kernel,
david, ryan.roberts, zhengtangquan, ying.huang, kasong, chrisl,
Barry Song
On Mon, Jan 6, 2025 at 10:39 PM Lance Yang <ioworker0@gmail.com> wrote:
>
> On Mon, Jan 6, 2025 at 5:34 PM Baolin Wang
> <baolin.wang@linux.alibaba.com> wrote:
> >
> >
> >
> > On 2025/1/6 17:03, Barry Song wrote:
> > > On Mon, Jan 6, 2025 at 7:40 PM Baolin Wang
> > > <baolin.wang@linux.alibaba.com> wrote:
> > >>
> > >>
> > >>
> > >> On 2025/1/6 11:17, Barry Song wrote:
> > >>> From: Barry Song <v-songbaohua@oppo.com>
> > >>>
> > >>> The refcount may be temporarily or long-term increased, but this does
> > >>> not change the fundamental nature of the folio already being lazy-
> > >>> freed. Therefore, we only reset 'swapbacked' when we are certain the
> > >>> folio is dirty and not droppable.
> > >>>
> > >>> Suggested-by: David Hildenbrand <david@redhat.com>
> > >>> Signed-off-by: Barry Song <v-songbaohua@oppo.com>
> > >>
> > >> The changes look good to me. While we are at it, could you also change
> > >> the __discard_anon_folio_pmd_locked() to follow the same strategy for
> > >> lazy-freed PMD-sized folio?
> > >
> > > it seems you mean __discard_anon_folio_pmd_locked() is lacking
> > > folio_set_swapbacked(folio) for dirty pmd-mapped folios?
>
> Good catch!
>
> Hmm... I don't recall why we don't call folio_set_swapbacked for dirty
> THPs in __discard_anon_folio_pmd_locked() - possibly to align with
> previous behavior ;)
>
> If a dirty PMD-mapped THP cannot be discarded, we just split it and
> restart the page walk to process the PTE-mapped THP. After that, we
> will only mark each folio within the THP as swap-backed individually.
>
> It seems like we could cut the work by calling folio_set_swapbacked()
> for dirty THPs directly in __discard_anon_folio_pmd_locked(), skipping
> the restart of the page walk after splitting the THP, IMHO ;)
In correction to the earlier email:
folio_set_swapbacked() is only called in __discard_anon_folio_pmd_locked()
when '!(vma->vm_flags & VM_DROPPABLE)' is true, IIUC.
Thanks,
Lance
>
> Thanks,
> Lance
>
>
> > > and it seems !(vma->vm_flags & VM_DROPPABLE) is also not
> > > handled properly?
>
>
> >
> > Right.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 2/3] mm: Support tlbbatch flush for a range of PTEs
2025-01-06 8:22 ` kernel test robot
@ 2025-01-13 0:55 ` Barry Song
2025-01-13 13:13 ` Oliver Sang
0 siblings, 1 reply; 19+ messages in thread
From: Barry Song @ 2025-01-13 0:55 UTC (permalink / raw)
To: lkp
Cc: 21cnbao, akpm, anshuman.khandual, baolin.wang, bp,
catalin.marinas, chrisl, dave.hansen, david, gshan, hpa,
ioworker0, kasong, kirill.shutemov, linux-arm-kernel,
linux-kernel, linux-mm, mark.rutland, mingo, oe-kbuild-all,
ryan.roberts, shahuang, tglx, v-songbaohua, wangkefeng.wang,
will, x86, ying.huang, yosryahmed, zhengtangquan
On Mon, Jan 6, 2025 at 9:23 PM kernel test robot <lkp@intel.com> wrote:
>
> Hi Barry,
>
> kernel test robot noticed the following build errors:
>
> [auto build test ERROR on akpm-mm/mm-everything]
>
> url: https://github.com/intel-lab-lkp/linux/commits/Barry-Song/mm-set-folio-swapbacked-iff-folios-are-dirty-in-try_to_unmap_one/20250106-112638
> base: https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-everything
> patch link: https://lore.kernel.org/r/20250106031711.82855-3-21cnbao%40gmail.com
> patch subject: [PATCH 2/3] mm: Support tlbbatch flush for a range of PTEs
> config: i386-buildonly-randconfig-002-20250106 (https://download.01.org/0day-ci/archive/20250106/202501061535.zx9E486H-lkp@intel.com/config)
> compiler: gcc-12 (Debian 12.2.0-14) 12.2.0
> reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250106/202501061535.zx9E486H-lkp@intel.com/reproduce)
>
> If you fix the issue in a separate patch/commit (i.e. not just a new version of
> the same patch/commit), kindly add following tags
> | Reported-by: kernel test robot <lkp@intel.com>
> | Closes: https://lore.kernel.org/oe-kbuild-all/202501061535.zx9E486H-lkp@intel.com/
Sorry. My bad, does the below fix the build?
diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h
index cda35f53f544..4b62a6329b8f 100644
--- a/arch/x86/include/asm/tlbflush.h
+++ b/arch/x86/include/asm/tlbflush.h
@@ -280,7 +280,7 @@ static inline u64 inc_mm_tlb_gen(struct mm_struct *mm)
static inline void arch_tlbbatch_add_pending(struct arch_tlbflush_unmap_batch *batch,
struct mm_struct *mm,
unsigned long uaddr,
- unsignd long size)
+ unsigned long size)
{
inc_mm_tlb_gen(mm);
cpumask_or(&batch->cpumask, &batch->cpumask, mm_cpumask(mm));
>
> All errors (new ones prefixed by >>):
>
> In file included from arch/x86/include/asm/uaccess.h:17,
> from include/linux/uaccess.h:12,
> from include/linux/sched/task.h:13,
> from include/linux/sched/signal.h:9,
> from include/linux/rcuwait.h:6,
> from include/linux/percpu-rwsem.h:7,
> from include/linux/fs.h:33,
> from include/linux/cgroup.h:17,
> from include/linux/memcontrol.h:13,
> from include/linux/swap.h:9,
> from include/linux/suspend.h:5,
> from arch/x86/kernel/asm-offsets.c:14:
> >> arch/x86/include/asm/tlbflush.h:283:46: error: unknown type name 'unsignd'; did you mean 'unsigned'?
> 283 | unsignd long size)
> | ^~~~~~~
> | unsigned
> make[3]: *** [scripts/Makefile.build:102: arch/x86/kernel/asm-offsets.s] Error 1 shuffle=998720002
> make[3]: Target 'prepare' not remade because of errors.
> make[2]: *** [Makefile:1263: prepare0] Error 2 shuffle=998720002
> make[2]: Target 'prepare' not remade because of errors.
> make[1]: *** [Makefile:251: __sub-make] Error 2 shuffle=998720002
> make[1]: Target 'prepare' not remade because of errors.
> make: *** [Makefile:251: __sub-make] Error 2 shuffle=998720002
> make: Target 'prepare' not remade because of errors.
>
>
> vim +283 arch/x86/include/asm/tlbflush.h
>
> 279
> 280 static inline void arch_tlbbatch_add_pending(struct arch_tlbflush_unmap_batch *batch,
> 281 struct mm_struct *mm,
> 282 unsigned long uaddr,
> > 283 unsignd long size)
> 284 {
> 285 inc_mm_tlb_gen(mm);
> 286 cpumask_or(&batch->cpumask, &batch->cpumask, mm_cpumask(mm));
> 287 mmu_notifier_arch_invalidate_secondary_tlbs(mm, 0, -1UL);
> 288 }
> 289
>
> --
> 0-DAY CI Kernel Test Service
> https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 2/3] mm: Support tlbbatch flush for a range of PTEs
2025-01-06 10:07 ` kernel test robot
@ 2025-01-13 0:56 ` Barry Song
2025-01-13 7:30 ` Oliver Sang
0 siblings, 1 reply; 19+ messages in thread
From: Barry Song @ 2025-01-13 0:56 UTC (permalink / raw)
To: lkp
Cc: 21cnbao, akpm, anshuman.khandual, baolin.wang, bp,
catalin.marinas, chrisl, dave.hansen, david, gshan, hpa,
ioworker0, kasong, kirill.shutemov, linux-arm-kernel,
linux-kernel, linux-mm, mark.rutland, mingo, oe-kbuild-all,
ryan.roberts, shahuang, tglx, v-songbaohua, wangkefeng.wang,
will, x86, ying.huang, yosryahmed, zhengtangquan
On Mon, Jan 6, 2025 at 11:08 PM kernel test robot <lkp@intel.com> wrote:
>
> Hi Barry,
>
> kernel test robot noticed the following build errors:
>
> [auto build test ERROR on akpm-mm/mm-everything]
>
> url: https://github.com/intel-lab-lkp/linux/commits/Barry-Song/mm-set-folio-swapbacked-iff-folios-are-dirty-in-try_to_unmap_one/20250106-112638
> base: https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-everything
> patch link: https://lore.kernel.org/r/20250106031711.82855-3-21cnbao%40gmail.com
> patch subject: [PATCH 2/3] mm: Support tlbbatch flush for a range of PTEs
> config: riscv-randconfig-001-20250106 (https://download.01.org/0day-ci/archive/20250106/202501061736.FoHcInHJ-lkp@intel.com/config)
> compiler: riscv64-linux-gcc (GCC) 14.2.0
> reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250106/202501061736.FoHcInHJ-lkp@intel.com/reproduce)
>
> If you fix the issue in a separate patch/commit (i.e. not just a new version of
> the same patch/commit), kindly add following tags
> | Reported-by: kernel test robot <lkp@intel.com>
> | Closes: https://lore.kernel.org/oe-kbuild-all/202501061736.FoHcInHJ-lkp@intel.com/
>
Sorry. My bad, does the below diff fix the build?
diff --git a/arch/riscv/include/asm/tlbflush.h b/arch/riscv/include/asm/tlbflush.h
index 72e559934952..7f3ea687ce33 100644
--- a/arch/riscv/include/asm/tlbflush.h
+++ b/arch/riscv/include/asm/tlbflush.h
@@ -61,7 +61,8 @@ void flush_pmd_tlb_range(struct vm_area_struct *vma, unsigned long start,
bool arch_tlbbatch_should_defer(struct mm_struct *mm);
void arch_tlbbatch_add_pending(struct arch_tlbflush_unmap_batch *batch,
struct mm_struct *mm,
- unsigned long uaddr);
+ unsigned long uaddr,
+ unsigned long size);
void arch_flush_tlb_batched_pending(struct mm_struct *mm);
void arch_tlbbatch_flush(struct arch_tlbflush_unmap_batch *batch);
diff --git a/arch/riscv/mm/tlbflush.c b/arch/riscv/mm/tlbflush.c
index 9b6e86ce3867..aeda64a36d50 100644
--- a/arch/riscv/mm/tlbflush.c
+++ b/arch/riscv/mm/tlbflush.c
@@ -187,7 +187,8 @@ bool arch_tlbbatch_should_defer(struct mm_struct *mm)
void arch_tlbbatch_add_pending(struct arch_tlbflush_unmap_batch *batch,
struct mm_struct *mm,
- unsigned long uaddr)
+ unsigned long uaddr,
+ unsigned long size)
{
cpumask_or(&batch->cpumask, &batch->cpumask, mm_cpumask(mm));
}
> All errors (new ones prefixed by >>):
>
> mm/rmap.c: In function 'set_tlb_ubc_flush_pending':
> >> mm/rmap.c:685:9: error: too many arguments to function 'arch_tlbbatch_add_pending'
> 685 | arch_tlbbatch_add_pending(&tlb_ubc->arch, mm, uaddr, size);
> | ^~~~~~~~~~~~~~~~~~~~~~~~~
> In file included from arch/riscv/include/asm/pgtable.h:113,
> from include/linux/pgtable.h:6,
> from include/linux/mm.h:30,
> from mm/rmap.c:55:
> arch/riscv/include/asm/tlbflush.h:62:6: note: declared here
> 62 | void arch_tlbbatch_add_pending(struct arch_tlbflush_unmap_batch *batch,
> | ^~~~~~~~~~~~~~~~~~~~~~~~~
>
>
> vim +/arch_tlbbatch_add_pending +685 mm/rmap.c
>
> 663
> 664 /*
> 665 * Bits 0-14 of mm->tlb_flush_batched record pending generations.
> 666 * Bits 16-30 of mm->tlb_flush_batched bit record flushed generations.
> 667 */
> 668 #define TLB_FLUSH_BATCH_FLUSHED_SHIFT 16
> 669 #define TLB_FLUSH_BATCH_PENDING_MASK \
> 670 ((1 << (TLB_FLUSH_BATCH_FLUSHED_SHIFT - 1)) - 1)
> 671 #define TLB_FLUSH_BATCH_PENDING_LARGE \
> 672 (TLB_FLUSH_BATCH_PENDING_MASK / 2)
> 673
> 674 static void set_tlb_ubc_flush_pending(struct mm_struct *mm, pte_t pteval,
> 675 unsigned long uaddr,
> 676 unsigned long size)
> 677 {
> 678 struct tlbflush_unmap_batch *tlb_ubc = ¤t->tlb_ubc;
> 679 int batch;
> 680 bool writable = pte_dirty(pteval);
> 681
> 682 if (!pte_accessible(mm, pteval))
> 683 return;
> 684
> > 685 arch_tlbbatch_add_pending(&tlb_ubc->arch, mm, uaddr, size);
> 686 tlb_ubc->flush_required = true;
> 687
> 688 /*
> 689 * Ensure compiler does not re-order the setting of tlb_flush_batched
> 690 * before the PTE is cleared.
> 691 */
> 692 barrier();
> 693 batch = atomic_read(&mm->tlb_flush_batched);
> 694 retry:
> 695 if ((batch & TLB_FLUSH_BATCH_PENDING_MASK) > TLB_FLUSH_BATCH_PENDING_LARGE) {
> 696 /*
> 697 * Prevent `pending' from catching up with `flushed' because of
> 698 * overflow. Reset `pending' and `flushed' to be 1 and 0 if
> 699 * `pending' becomes large.
> 700 */
> 701 if (!atomic_try_cmpxchg(&mm->tlb_flush_batched, &batch, 1))
> 702 goto retry;
> 703 } else {
> 704 atomic_inc(&mm->tlb_flush_batched);
> 705 }
> 706
> 707 /*
> 708 * If the PTE was dirty then it's best to assume it's writable. The
> 709 * caller must use try_to_unmap_flush_dirty() or try_to_unmap_flush()
> 710 * before the page is queued for IO.
> 711 */
> 712 if (writable)
> 713 tlb_ubc->writable = true;
> 714 }
> 715
>
> --
> 0-DAY CI Kernel Test Service
> https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 2/3] mm: Support tlbbatch flush for a range of PTEs
2025-01-13 0:56 ` Barry Song
@ 2025-01-13 7:30 ` Oliver Sang
0 siblings, 0 replies; 19+ messages in thread
From: Oliver Sang @ 2025-01-13 7:30 UTC (permalink / raw)
To: Barry Song
Cc: lkp, akpm, anshuman.khandual, baolin.wang, bp, catalin.marinas,
chrisl, dave.hansen, david, gshan, hpa, ioworker0, kasong,
kirill.shutemov, linux-arm-kernel, linux-kernel, linux-mm,
mark.rutland, mingo, oe-kbuild-all, ryan.roberts, shahuang, tglx,
v-songbaohua, wangkefeng.wang, will, x86, ying.huang, yosryahmed,
zhengtangquan, oliver.sang
hi, Barry,
On Mon, Jan 13, 2025 at 01:56:26PM +1300, Barry Song wrote:
> On Mon, Jan 6, 2025 at 11:08 PM kernel test robot <lkp@intel.com> wrote:
> >
> > Hi Barry,
> >
> > kernel test robot noticed the following build errors:
> >
> > [auto build test ERROR on akpm-mm/mm-everything]
> >
> > url: https://github.com/intel-lab-lkp/linux/commits/Barry-Song/mm-set-folio-swapbacked-iff-folios-are-dirty-in-try_to_unmap_one/20250106-112638
> > base: https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-everything
> > patch link: https://lore.kernel.org/r/20250106031711.82855-3-21cnbao%40gmail.com
> > patch subject: [PATCH 2/3] mm: Support tlbbatch flush for a range of PTEs
> > config: riscv-randconfig-001-20250106 (https://download.01.org/0day-ci/archive/20250106/202501061736.FoHcInHJ-lkp@intel.com/config)
> > compiler: riscv64-linux-gcc (GCC) 14.2.0
> > reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250106/202501061736.FoHcInHJ-lkp@intel.com/reproduce)
> >
> > If you fix the issue in a separate patch/commit (i.e. not just a new version of
> > the same patch/commit), kindly add following tags
> > | Reported-by: kernel test robot <lkp@intel.com>
> > | Closes: https://lore.kernel.org/oe-kbuild-all/202501061736.FoHcInHJ-lkp@intel.com/
> >
>
> Sorry. My bad, does the below diff fix the build?
yes, below diff fixes the build. thanks
Tested-by: kernel test robot <oliver.sang@intel.com>
>
> diff --git a/arch/riscv/include/asm/tlbflush.h b/arch/riscv/include/asm/tlbflush.h
> index 72e559934952..7f3ea687ce33 100644
> --- a/arch/riscv/include/asm/tlbflush.h
> +++ b/arch/riscv/include/asm/tlbflush.h
> @@ -61,7 +61,8 @@ void flush_pmd_tlb_range(struct vm_area_struct *vma, unsigned long start,
> bool arch_tlbbatch_should_defer(struct mm_struct *mm);
> void arch_tlbbatch_add_pending(struct arch_tlbflush_unmap_batch *batch,
> struct mm_struct *mm,
> - unsigned long uaddr);
> + unsigned long uaddr,
> + unsigned long size);
> void arch_flush_tlb_batched_pending(struct mm_struct *mm);
> void arch_tlbbatch_flush(struct arch_tlbflush_unmap_batch *batch);
>
> diff --git a/arch/riscv/mm/tlbflush.c b/arch/riscv/mm/tlbflush.c
> index 9b6e86ce3867..aeda64a36d50 100644
> --- a/arch/riscv/mm/tlbflush.c
> +++ b/arch/riscv/mm/tlbflush.c
> @@ -187,7 +187,8 @@ bool arch_tlbbatch_should_defer(struct mm_struct *mm)
>
> void arch_tlbbatch_add_pending(struct arch_tlbflush_unmap_batch *batch,
> struct mm_struct *mm,
> - unsigned long uaddr)
> + unsigned long uaddr,
> + unsigned long size)
> {
> cpumask_or(&batch->cpumask, &batch->cpumask, mm_cpumask(mm));
> }
>
> > All errors (new ones prefixed by >>):
> >
> > mm/rmap.c: In function 'set_tlb_ubc_flush_pending':
> > >> mm/rmap.c:685:9: error: too many arguments to function 'arch_tlbbatch_add_pending'
> > 685 | arch_tlbbatch_add_pending(&tlb_ubc->arch, mm, uaddr, size);
> > | ^~~~~~~~~~~~~~~~~~~~~~~~~
> > In file included from arch/riscv/include/asm/pgtable.h:113,
> > from include/linux/pgtable.h:6,
> > from include/linux/mm.h:30,
> > from mm/rmap.c:55:
> > arch/riscv/include/asm/tlbflush.h:62:6: note: declared here
> > 62 | void arch_tlbbatch_add_pending(struct arch_tlbflush_unmap_batch *batch,
> > | ^~~~~~~~~~~~~~~~~~~~~~~~~
> >
> >
> > vim +/arch_tlbbatch_add_pending +685 mm/rmap.c
> >
> > 663
> > 664 /*
> > 665 * Bits 0-14 of mm->tlb_flush_batched record pending generations.
> > 666 * Bits 16-30 of mm->tlb_flush_batched bit record flushed generations.
> > 667 */
> > 668 #define TLB_FLUSH_BATCH_FLUSHED_SHIFT 16
> > 669 #define TLB_FLUSH_BATCH_PENDING_MASK \
> > 670 ((1 << (TLB_FLUSH_BATCH_FLUSHED_SHIFT - 1)) - 1)
> > 671 #define TLB_FLUSH_BATCH_PENDING_LARGE \
> > 672 (TLB_FLUSH_BATCH_PENDING_MASK / 2)
> > 673
> > 674 static void set_tlb_ubc_flush_pending(struct mm_struct *mm, pte_t pteval,
> > 675 unsigned long uaddr,
> > 676 unsigned long size)
> > 677 {
> > 678 struct tlbflush_unmap_batch *tlb_ubc = ¤t->tlb_ubc;
> > 679 int batch;
> > 680 bool writable = pte_dirty(pteval);
> > 681
> > 682 if (!pte_accessible(mm, pteval))
> > 683 return;
> > 684
> > > 685 arch_tlbbatch_add_pending(&tlb_ubc->arch, mm, uaddr, size);
> > 686 tlb_ubc->flush_required = true;
> > 687
> > 688 /*
> > 689 * Ensure compiler does not re-order the setting of tlb_flush_batched
> > 690 * before the PTE is cleared.
> > 691 */
> > 692 barrier();
> > 693 batch = atomic_read(&mm->tlb_flush_batched);
> > 694 retry:
> > 695 if ((batch & TLB_FLUSH_BATCH_PENDING_MASK) > TLB_FLUSH_BATCH_PENDING_LARGE) {
> > 696 /*
> > 697 * Prevent `pending' from catching up with `flushed' because of
> > 698 * overflow. Reset `pending' and `flushed' to be 1 and 0 if
> > 699 * `pending' becomes large.
> > 700 */
> > 701 if (!atomic_try_cmpxchg(&mm->tlb_flush_batched, &batch, 1))
> > 702 goto retry;
> > 703 } else {
> > 704 atomic_inc(&mm->tlb_flush_batched);
> > 705 }
> > 706
> > 707 /*
> > 708 * If the PTE was dirty then it's best to assume it's writable. The
> > 709 * caller must use try_to_unmap_flush_dirty() or try_to_unmap_flush()
> > 710 * before the page is queued for IO.
> > 711 */
> > 712 if (writable)
> > 713 tlb_ubc->writable = true;
> > 714 }
> > 715
> >
> > --
> > 0-DAY CI Kernel Test Service
> > https://github.com/intel/lkp-tests/wiki
>
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 2/3] mm: Support tlbbatch flush for a range of PTEs
2025-01-13 0:55 ` Barry Song
@ 2025-01-13 13:13 ` Oliver Sang
0 siblings, 0 replies; 19+ messages in thread
From: Oliver Sang @ 2025-01-13 13:13 UTC (permalink / raw)
To: Barry Song
Cc: lkp, akpm, anshuman.khandual, baolin.wang, bp, catalin.marinas,
chrisl, dave.hansen, david, gshan, hpa, ioworker0, kasong,
kirill.shutemov, linux-arm-kernel, linux-kernel, linux-mm,
mark.rutland, mingo, oe-kbuild-all, ryan.roberts, shahuang, tglx,
v-songbaohua, wangkefeng.wang, will, x86, ying.huang, yosryahmed,
zhengtangquan, oliver.sang
hi, Barry,
On Mon, Jan 13, 2025 at 01:55:04PM +1300, Barry Song wrote:
> On Mon, Jan 6, 2025 at 9:23 PM kernel test robot <lkp@intel.com> wrote:
> >
> > Hi Barry,
> >
> > kernel test robot noticed the following build errors:
> >
> > [auto build test ERROR on akpm-mm/mm-everything]
> >
> > url: https://github.com/intel-lab-lkp/linux/commits/Barry-Song/mm-set-folio-swapbacked-iff-folios-are-dirty-in-try_to_unmap_one/20250106-112638
> > base: https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-everything
> > patch link: https://lore.kernel.org/r/20250106031711.82855-3-21cnbao%40gmail.com
> > patch subject: [PATCH 2/3] mm: Support tlbbatch flush for a range of PTEs
> > config: i386-buildonly-randconfig-002-20250106 (https://download.01.org/0day-ci/archive/20250106/202501061535.zx9E486H-lkp@intel.com/config)
> > compiler: gcc-12 (Debian 12.2.0-14) 12.2.0
> > reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250106/202501061535.zx9E486H-lkp@intel.com/reproduce)
> >
> > If you fix the issue in a separate patch/commit (i.e. not just a new version of
> > the same patch/commit), kindly add following tags
> > | Reported-by: kernel test robot <lkp@intel.com>
> > | Closes: https://lore.kernel.org/oe-kbuild-all/202501061535.zx9E486H-lkp@intel.com/
>
> Sorry. My bad, does the below fix the build?
yes, below diff fixes the build. thanks
Tested-by: kernel test robot <oliver.sang@intel.com>
>
> diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h
> index cda35f53f544..4b62a6329b8f 100644
> --- a/arch/x86/include/asm/tlbflush.h
> +++ b/arch/x86/include/asm/tlbflush.h
> @@ -280,7 +280,7 @@ static inline u64 inc_mm_tlb_gen(struct mm_struct *mm)
> static inline void arch_tlbbatch_add_pending(struct arch_tlbflush_unmap_batch *batch,
> struct mm_struct *mm,
> unsigned long uaddr,
> - unsignd long size)
> + unsigned long size)
> {
> inc_mm_tlb_gen(mm);
> cpumask_or(&batch->cpumask, &batch->cpumask, mm_cpumask(mm));
>
> >
> > All errors (new ones prefixed by >>):
> >
> > In file included from arch/x86/include/asm/uaccess.h:17,
> > from include/linux/uaccess.h:12,
> > from include/linux/sched/task.h:13,
> > from include/linux/sched/signal.h:9,
> > from include/linux/rcuwait.h:6,
> > from include/linux/percpu-rwsem.h:7,
> > from include/linux/fs.h:33,
> > from include/linux/cgroup.h:17,
> > from include/linux/memcontrol.h:13,
> > from include/linux/swap.h:9,
> > from include/linux/suspend.h:5,
> > from arch/x86/kernel/asm-offsets.c:14:
> > >> arch/x86/include/asm/tlbflush.h:283:46: error: unknown type name 'unsignd'; did you mean 'unsigned'?
> > 283 | unsignd long size)
> > | ^~~~~~~
> > | unsigned
> > make[3]: *** [scripts/Makefile.build:102: arch/x86/kernel/asm-offsets.s] Error 1 shuffle=998720002
> > make[3]: Target 'prepare' not remade because of errors.
> > make[2]: *** [Makefile:1263: prepare0] Error 2 shuffle=998720002
> > make[2]: Target 'prepare' not remade because of errors.
> > make[1]: *** [Makefile:251: __sub-make] Error 2 shuffle=998720002
> > make[1]: Target 'prepare' not remade because of errors.
> > make: *** [Makefile:251: __sub-make] Error 2 shuffle=998720002
> > make: Target 'prepare' not remade because of errors.
> >
> >
> > vim +283 arch/x86/include/asm/tlbflush.h
> >
> > 279
> > 280 static inline void arch_tlbbatch_add_pending(struct arch_tlbflush_unmap_batch *batch,
> > 281 struct mm_struct *mm,
> > 282 unsigned long uaddr,
> > > 283 unsignd long size)
> > 284 {
> > 285 inc_mm_tlb_gen(mm);
> > 286 cpumask_or(&batch->cpumask, &batch->cpumask, mm_cpumask(mm));
> > 287 mmu_notifier_arch_invalidate_secondary_tlbs(mm, 0, -1UL);
> > 288 }
> > 289
> >
> > --
> > 0-DAY CI Kernel Test Service
> > https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2025-01-13 13:13 UTC | newest]
Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-01-06 3:17 [PATCH 0/3] mm: batched unmap lazyfree large folios during reclamation Barry Song
2025-01-06 3:17 ` [PATCH 1/3] mm: set folio swapbacked iff folios are dirty in try_to_unmap_one Barry Song
2025-01-06 6:40 ` Baolin Wang
2025-01-06 9:03 ` Barry Song
2025-01-06 9:34 ` Baolin Wang
2025-01-06 14:39 ` Lance Yang
2025-01-06 20:52 ` Barry Song
2025-01-06 20:56 ` Barry Song
2025-01-07 1:33 ` Lance Yang
2025-01-06 3:17 ` [PATCH 2/3] mm: Support tlbbatch flush for a range of PTEs Barry Song
2025-01-06 8:22 ` kernel test robot
2025-01-13 0:55 ` Barry Song
2025-01-13 13:13 ` Oliver Sang
2025-01-06 10:07 ` kernel test robot
2025-01-13 0:56 ` Barry Song
2025-01-13 7:30 ` Oliver Sang
2025-01-06 3:17 ` [PATCH 3/3] mm: Support batched unmap for lazyfree large folios during reclamation Barry Song
2025-01-06 17:28 ` [PATCH 0/3] mm: batched unmap " Lorenzo Stoakes
2025-01-06 19:15 ` Barry Song
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox