linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Barry Song <21cnbao@gmail.com>
To: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: akpm@linux-foundation.org, david@kernel.org,
	catalin.marinas@arm.com,  will@kernel.org,
	lorenzo.stoakes@oracle.com, ryan.roberts@arm.com,
	 Liam.Howlett@oracle.com, vbabka@suse.cz, rppt@kernel.org,
	surenb@google.com,  mhocko@suse.com, riel@surriel.com,
	harry.yoo@oracle.com, jannh@google.com,  willy@infradead.org,
	dev.jain@arm.com, linux-mm@kvack.org,
	 linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH v6 1/5] mm: rmap: support batched checks of the references for large folios
Date: Sat, 7 Mar 2026 05:07:13 +0800	[thread overview]
Message-ID: <CAGsJ_4yEWn3kVZPQZatOJX7NLLv-OK9pAqUDuWz25XxG5hpV=w@mail.gmail.com> (raw)
In-Reply-To: <12132694536834262062d1fb304f8f8a064b6750.1770645603.git.baolin.wang@linux.alibaba.com>

On Mon, Feb 9, 2026 at 10:07 PM Baolin Wang
<baolin.wang@linux.alibaba.com> wrote:
>
> Currently, folio_referenced_one() always checks the young flag for each PTE
> sequentially, which is inefficient for large folios. This inefficiency is
> especially noticeable when reclaiming clean file-backed large folios, where
> folio_referenced() is observed as a significant performance hotspot.
>
> Moreover, on Arm64 architecture, which supports contiguous PTEs, there is already
> an optimization to clear the young flags for PTEs within a contiguous range.
> However, this is not sufficient. We can extend this to perform batched operations
> for the entire large folio (which might exceed the contiguous range: CONT_PTE_SIZE).
>
> Introduce a new API: clear_flush_young_ptes() to facilitate batched checking
> of the young flags and flushing TLB entries, thereby improving performance
> during large folio reclamation. And it will be overridden by the architecture
> that implements a more efficient batch operation in the following patches.
>
> While we are at it, rename ptep_clear_flush_young_notify() to
> clear_flush_young_ptes_notify() to indicate that this is a batch operation.
>
> Reviewed-by: Harry Yoo <harry.yoo@oracle.com>
> Reviewed-by: Ryan Roberts <ryan.roberts@arm.com>
> Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>

LGTM,

Reviewed-by: Barry Song <baohua@kernel.org>

> ---
>  include/linux/mmu_notifier.h |  9 +++++----
>  include/linux/pgtable.h      | 35 +++++++++++++++++++++++++++++++++++
>  mm/rmap.c                    | 28 +++++++++++++++++++++++++---
>  3 files changed, 65 insertions(+), 7 deletions(-)
>
> diff --git a/include/linux/mmu_notifier.h b/include/linux/mmu_notifier.h
> index d1094c2d5fb6..07a2bbaf86e9 100644
> --- a/include/linux/mmu_notifier.h
> +++ b/include/linux/mmu_notifier.h
> @@ -515,16 +515,17 @@ static inline void mmu_notifier_range_init_owner(
>         range->owner = owner;
>  }
>
> -#define ptep_clear_flush_young_notify(__vma, __address, __ptep)                \
> +#define clear_flush_young_ptes_notify(__vma, __address, __ptep, __nr)  \
>  ({                                                                     \
>         int __young;                                                    \
>         struct vm_area_struct *___vma = __vma;                          \
>         unsigned long ___address = __address;                           \
> -       __young = ptep_clear_flush_young(___vma, ___address, __ptep);   \
> +       unsigned int ___nr = __nr;                                      \
> +       __young = clear_flush_young_ptes(___vma, ___address, __ptep, ___nr);    \
>         __young |= mmu_notifier_clear_flush_young(___vma->vm_mm,        \
>                                                   ___address,           \
>                                                   ___address +          \
> -                                                       PAGE_SIZE);     \
> +                                                 ___nr * PAGE_SIZE);   \
>         __young;                                                        \
>  })
>
> @@ -650,7 +651,7 @@ static inline void mmu_notifier_subscriptions_destroy(struct mm_struct *mm)
>
>  #define mmu_notifier_range_update_to_read_only(r) false
>
> -#define ptep_clear_flush_young_notify ptep_clear_flush_young
> +#define clear_flush_young_ptes_notify clear_flush_young_ptes
>  #define pmdp_clear_flush_young_notify pmdp_clear_flush_young
>  #define ptep_clear_young_notify ptep_test_and_clear_young
>  #define pmdp_clear_young_notify pmdp_test_and_clear_young
> diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
> index 21b67d937555..a50df42a893f 100644
> --- a/include/linux/pgtable.h
> +++ b/include/linux/pgtable.h
> @@ -1068,6 +1068,41 @@ static inline void wrprotect_ptes(struct mm_struct *mm, unsigned long addr,
>  }
>  #endif
>
> +#ifndef clear_flush_young_ptes
> +/**
> + * clear_flush_young_ptes - Mark PTEs that map consecutive pages of the same
> + *                         folio as old and flush the TLB.
> + * @vma: The virtual memory area the pages are mapped into.
> + * @addr: Address the first page is mapped at.
> + * @ptep: Page table pointer for the first entry.
> + * @nr: Number of entries to clear access bit.
> + *
> + * May be overridden by the architecture; otherwise, implemented as a simple
> + * loop over ptep_clear_flush_young().
> + *
> + * Note that PTE bits in the PTE range besides the PFN can differ. For example,
> + * some PTEs might be write-protected.
> + *
> + * Context: The caller holds the page table lock.  The PTEs map consecutive
> + * pages that belong to the same folio.  The PTEs are all in the same PMD.
> + */
> +static inline int clear_flush_young_ptes(struct vm_area_struct *vma,
> +               unsigned long addr, pte_t *ptep, unsigned int nr)
> +{
> +       int young = 0;
> +
> +       for (;;) {
> +               young |= ptep_clear_flush_young(vma, addr, ptep);
> +               if (--nr == 0)
> +                       break;
> +               ptep++;
> +               addr += PAGE_SIZE;
> +       }
> +
> +       return young;
> +}
> +#endif

We might have an opportunity to batch the TLB synchronization,
using flush_tlb_range() instead of calling flush_tlb_page()
one by one. Not sure the benefit would be significant though,
especially if only one entry among nr has the young bit set.

Best Regards
Barry


  parent reply	other threads:[~2026-03-06 21:07 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-09 14:07 [PATCH v6 0/5] support batch checking of references and unmapping " Baolin Wang
2026-02-09 14:07 ` [PATCH v6 1/5] mm: rmap: support batched checks of the references " Baolin Wang
2026-02-09 15:25   ` David Hildenbrand (Arm)
2026-03-06 21:07   ` Barry Song [this message]
2026-03-07  2:22     ` Baolin Wang
2026-02-09 14:07 ` [PATCH v6 2/5] arm64: mm: factor out the address and ptep alignment into a new helper Baolin Wang
2026-02-09 14:07 ` [PATCH v6 3/5] arm64: mm: support batch clearing of the young flag for large folios Baolin Wang
2026-02-09 14:07 ` [PATCH v6 4/5] arm64: mm: implement the architecture-specific clear_flush_young_ptes() Baolin Wang
2026-02-09 15:30   ` David Hildenbrand (Arm)
2026-02-10  0:39     ` Baolin Wang
2026-03-06 21:20   ` Barry Song
2026-03-07  2:14     ` Baolin Wang
2026-02-09 14:07 ` [PATCH v6 5/5] mm: rmap: support batched unmapping for file large folios Baolin Wang
2026-02-09 15:31   ` David Hildenbrand (Arm)
2026-02-10  1:53 ` [PATCH v6 0/5] support batch checking of references and unmapping for " Andrew Morton
2026-02-10  2:01   ` Baolin Wang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAGsJ_4yEWn3kVZPQZatOJX7NLLv-OK9pAqUDuWz25XxG5hpV=w@mail.gmail.com' \
    --to=21cnbao@gmail.com \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=baolin.wang@linux.alibaba.com \
    --cc=catalin.marinas@arm.com \
    --cc=david@kernel.org \
    --cc=dev.jain@arm.com \
    --cc=harry.yoo@oracle.com \
    --cc=jannh@google.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=mhocko@suse.com \
    --cc=riel@surriel.com \
    --cc=rppt@kernel.org \
    --cc=ryan.roberts@arm.com \
    --cc=surenb@google.com \
    --cc=vbabka@suse.cz \
    --cc=will@kernel.org \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox