linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Baolin Wang <baolin.wang@linux.alibaba.com>
To: "David Hildenbrand (Arm)" <david@kernel.org>, akpm@linux-foundation.org
Cc: catalin.marinas@arm.com, will@kernel.org,
	lorenzo.stoakes@oracle.com, ryan.roberts@arm.com,
	Liam.Howlett@oracle.com, vbabka@suse.cz, rppt@kernel.org,
	surenb@google.com, mhocko@suse.com, riel@surriel.com,
	harry.yoo@oracle.com, jannh@google.com, willy@infradead.org,
	baohua@kernel.org, dev.jain@arm.com, axelrasmussen@google.com,
	yuanchu@google.com, weixugc@google.com, hannes@cmpxchg.org,
	zhengqi.arch@bytedance.com, shakeel.butt@linux.dev,
	linux-mm@kvack.org, linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH 3/5] mm: add a batched helper to clear the young flag for large folios
Date: Thu, 26 Feb 2026 11:42:12 +0800	[thread overview]
Message-ID: <32c538ce-6af8-48a8-86fc-d26ee253af54@linux.alibaba.com> (raw)
In-Reply-To: <d172d6bf-c60c-4cf5-9da9-f30de38cdfed@kernel.org>



On 2/25/26 10:04 PM, David Hildenbrand (Arm) wrote:
> On 2/24/26 02:56, Baolin Wang wrote:
>> Currently, MGLRU will call ptep_clear_young_notify() to check and clear the
>> young flag for each PTE sequentially, which is inefficient for large folios
>> reclamation.
>>
>> Moreover, on Arm64 architecture, which supports contiguous PTEs, the Arm64-
>> specific ptep_test_and_clear_young() already implements an optimization to
>> clear the young flags for PTEs within a contiguous range. However, this is not
>> sufficient. Similar to the Arm64 specific clear_flush_young_ptes(), we can
>> extend this to perform batched operations for the entire large folio (which
>> might exceed the contiguous range: CONT_PTE_SIZE).
>>
>> Thus, we can introduce a new batched helper: test_and_clear_young_ptes() and
>> its wrapper clear_young_ptes_notify(), to perform batched checking of the young
>> flags for large folios, which can help improve performance during large folio
>> reclamation when MGLRU is enabled. And it will be overridden by the architecture
>> that implements a more efficient batch operation in the following patches.
>>
> 
> Maybe mention that the implementation follows the other existing functions.

Ack.

>> Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
>> ---
>>   include/linux/pgtable.h | 36 ++++++++++++++++++++++++++++++++++++
>>   mm/internal.h           | 23 ++++++++++++++++++-----
>>   2 files changed, 54 insertions(+), 5 deletions(-)
>>
>> diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
>> index 776993d4567b..0bcd3be524d3 100644
>> --- a/include/linux/pgtable.h
>> +++ b/include/linux/pgtable.h
>> @@ -1103,6 +1103,42 @@ static inline int clear_flush_young_ptes(struct vm_area_struct *vma,
>>   }
>>   #endif
>>   
>> +#ifndef test_and_clear_young_ptes
>> +/**
>> + * test_and_clear_young_ptes - Mark PTEs that map consecutive pages of the same
>> + *			       folio as old
>> + * @vma: The virtual memory area the pages are mapped into.
>> + * @addr: Address the first page is mapped at.
>> + * @ptep: Page table pointer for the first entry.
>> + * @nr: Number of entries to clear access bit.
>> + *
>> + * May be overridden by the architecture; otherwise, implemented as a simple
>> + * loop over ptep_test_and_clear_young().
>> + *
>> + * Note that PTE bits in the PTE range besides the PFN can differ. For example,
>> + * some PTEs might be write-protected.
> 
> Document the return value?
> 
> Returns: whether any PTE was young.

Ack.

> 
> Or sth like that.
> 
>> + *
>> + * Context: The caller holds the page table lock.  The PTEs map consecutive
>> + * pages that belong to the same folio.  The PTEs are all in the same PMD.
>> + */
>> +static inline int test_and_clear_young_ptes(struct vm_area_struct *vma,
>> +					    unsigned long addr, pte_t *ptep,
>> +					    unsigned int nr)
> 
> Two tabs ...

Ack.

> 
>> +{
>> +	int young = 0;
>> +
>> +	for (;;) {
>> +		young |= ptep_test_and_clear_young(vma, addr, ptep);
>> +		if (--nr == 0)
>> +			break;
>> +		ptep++;
>> +		addr += PAGE_SIZE;
>> +	}
>> +
>> +	return young;
> 
> BTW: can this function simply return (and use) a bool instead?
> 
> Likely we should do the same for the other functions, but that can be
> done separately.

Yes, add this to my TODO list to convert all related functions.

>>   /*
>>    * On some architectures hardware does not set page access bit when accessing
>>    * memory page, it is responsibility of software setting this bit. It brings
>> diff --git a/mm/internal.h b/mm/internal.h
>> index 1ba175b8d4f1..1b59be99dc3f 100644
>> --- a/mm/internal.h
>> +++ b/mm/internal.h
>> @@ -1813,16 +1813,23 @@ static inline int pmdp_clear_flush_young_notify(struct vm_area_struct *vma,
>>   	return young;
>>   }
>>   
>> -static inline int ptep_clear_young_notify(struct vm_area_struct *vma,
>> -					  unsigned long addr, pte_t *ptep)
>> +static inline int clear_young_ptes_notify(struct vm_area_struct *vma,
>> +					  unsigned long addr, pte_t *ptep,
>> +					  unsigned int nr)
>>   {
>>   	int young;
>>   
>> -	young = ptep_test_and_clear_young(vma, addr, ptep);
>> -	young |= mmu_notifier_clear_young(vma->vm_mm, addr, addr + PAGE_SIZE);
>> +	young = test_and_clear_young_ptes(vma, addr, ptep, nr);
>> +	young |= mmu_notifier_clear_young(vma->vm_mm, addr, addr + nr * PAGE_SIZE);
>>   	return young;
>>   }
>>   
>> +static inline int ptep_clear_young_notify(struct vm_area_struct *vma,
>> +					  unsigned long addr, pte_t *ptep)
>> +{
>> +	return clear_young_ptes_notify(vma, addr, ptep, 1);
>> +}
>> +
>>   static inline int pmdp_clear_young_notify(struct vm_area_struct *vma,
>>   					  unsigned long addr, pmd_t *pmdp)
>>   {
>> @@ -1837,9 +1844,15 @@ static inline int pmdp_clear_young_notify(struct vm_area_struct *vma,
>>   
>>   #define clear_flush_young_ptes_notify	clear_flush_young_ptes
>>   #define pmdp_clear_flush_young_notify	pmdp_clear_flush_young
>> -#define ptep_clear_young_notify	ptep_test_and_clear_young
>> +#define clear_young_ptes_notify	test_and_clear_young_ptes
>>   #define pmdp_clear_young_notify	pmdp_test_and_clear_young
>>   
>> +static inline int ptep_clear_young_notify(struct vm_area_struct *vma,
>> +					  unsigned long addr, pte_t *ptep)
>> +{
>> +	return test_and_clear_young_ptes(vma, addr, ptep, 1);
>> +}
> 
> Why not outside of the ifdef a single generic
> 
> static inline int ptep_clear_young_notify(struct vm_area_struct *vma,
> 		 unsigned long addr, pte_t *ptep)
> {
> 	return clear_young_ptes_notify(vma, addr, ptep, 1);
> }

Yes, will do. And this function will be removed in the following patch.


  reply	other threads:[~2026-02-26  3:42 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-24  1:56 [PATCH 0/5] support batched checking of the young flag for MGLRU Baolin Wang
2026-02-24  1:56 ` [PATCH 1/5] mm: use inline helper functions instead of ugly macros Baolin Wang
2026-02-24  2:36   ` Rik van Riel
2026-02-24  7:09   ` Barry Song
2026-02-25 13:56   ` David Hildenbrand (Arm)
2026-02-26  3:36     ` Baolin Wang
2026-02-24  1:56 ` [PATCH 2/5] mm: rmap: add a ZONE_DEVICE folio warning in folio_referenced() Baolin Wang
2026-02-24  2:38   ` Rik van Riel
2026-02-24  5:49     ` Baolin Wang
2026-02-25 13:57       ` David Hildenbrand (Arm)
2026-02-24  6:34   ` Alistair Popple
2026-02-24  1:56 ` [PATCH 3/5] mm: add a batched helper to clear the young flag for large folios Baolin Wang
2026-02-24 22:03   ` Rik van Riel
2026-02-25  2:05     ` Baolin Wang
2026-02-25 14:04   ` David Hildenbrand (Arm)
2026-02-26  3:42     ` Baolin Wang [this message]
2026-02-24  1:56 ` [PATCH 4/5] mm: support batched checking of the young flag for MGLRU Baolin Wang
2026-02-24 22:12   ` Rik van Riel
2026-02-25 14:25   ` David Hildenbrand (Arm)
2026-02-26  5:56     ` Baolin Wang
2026-02-24  1:56 ` [PATCH 5/5] arm64: mm: implement the architecture-specific test_and_clear_young_ptes() Baolin Wang
2026-02-25  0:23   ` Rik van Riel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=32c538ce-6af8-48a8-86fc-d26ee253af54@linux.alibaba.com \
    --to=baolin.wang@linux.alibaba.com \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=axelrasmussen@google.com \
    --cc=baohua@kernel.org \
    --cc=catalin.marinas@arm.com \
    --cc=david@kernel.org \
    --cc=dev.jain@arm.com \
    --cc=hannes@cmpxchg.org \
    --cc=harry.yoo@oracle.com \
    --cc=jannh@google.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=mhocko@suse.com \
    --cc=riel@surriel.com \
    --cc=rppt@kernel.org \
    --cc=ryan.roberts@arm.com \
    --cc=shakeel.butt@linux.dev \
    --cc=surenb@google.com \
    --cc=vbabka@suse.cz \
    --cc=weixugc@google.com \
    --cc=will@kernel.org \
    --cc=willy@infradead.org \
    --cc=yuanchu@google.com \
    --cc=zhengqi.arch@bytedance.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox