From: Baolin Wang <baolin.wang@linux.alibaba.com>
To: Barry Song <21cnbao@gmail.com>, Dev Jain <dev.jain@arm.com>
Cc: Wei Yang <richard.weiyang@gmail.com>,
akpm@linux-foundation.org, david@kernel.org,
catalin.marinas@arm.com, will@kernel.org,
lorenzo.stoakes@oracle.com, ryan.roberts@arm.com,
Liam.Howlett@oracle.com, vbabka@suse.cz, rppt@kernel.org,
surenb@google.com, mhocko@suse.com, riel@surriel.com,
harry.yoo@oracle.com, jannh@google.com, willy@infradead.org,
linux-mm@kvack.org, linux-arm-kernel@lists.infradead.org,
linux-kernel@vger.kernel.org
Subject: Re: [PATCH v5 5/5] mm: rmap: support batched unmapping for file large folios
Date: Fri, 16 Jan 2026 23:49:08 +0800 [thread overview]
Message-ID: <84214d4c-2898-462c-ad27-fee39502128a@linux.alibaba.com> (raw)
In-Reply-To: <CAGsJ_4yo-EE__1VzHqxE8ehDOFQuxqP9GXcskkeZ-WTTyz12_A@mail.gmail.com>
On 1/16/26 10:28 PM, Barry Song wrote:
> On Fri, Jan 16, 2026 at 5:53 PM Dev Jain <dev.jain@arm.com> wrote:
>>
>>
>> On 07/01/26 7:16 am, Wei Yang wrote:
>>> On Wed, Jan 07, 2026 at 10:29:25AM +1300, Barry Song wrote:
>>>> On Wed, Jan 7, 2026 at 2:22 AM Wei Yang <richard.weiyang@gmail.com> wrote:
>>>>> On Fri, Dec 26, 2025 at 02:07:59PM +0800, Baolin Wang wrote:
>>>>>> Similar to folio_referenced_one(), we can apply batched unmapping for file
>>>>>> large folios to optimize the performance of file folios reclamation.
>>>>>>
>>>>>> Barry previously implemented batched unmapping for lazyfree anonymous large
>>>>>> folios[1] and did not further optimize anonymous large folios or file-backed
>>>>>> large folios at that stage. As for file-backed large folios, the batched
>>>>>> unmapping support is relatively straightforward, as we only need to clear
>>>>>> the consecutive (present) PTE entries for file-backed large folios.
>>>>>>
>>>>>> Performance testing:
>>>>>> Allocate 10G clean file-backed folios by mmap() in a memory cgroup, and try to
>>>>>> reclaim 8G file-backed folios via the memory.reclaim interface. I can observe
>>>>>> 75% performance improvement on my Arm64 32-core server (and 50%+ improvement
>>>>>> on my X86 machine) with this patch.
>>>>>>
>>>>>> W/o patch:
>>>>>> real 0m1.018s
>>>>>> user 0m0.000s
>>>>>> sys 0m1.018s
>>>>>>
>>>>>> W/ patch:
>>>>>> real 0m0.249s
>>>>>> user 0m0.000s
>>>>>> sys 0m0.249s
>>>>>>
>>>>>> [1] https://lore.kernel.org/all/20250214093015.51024-4-21cnbao@gmail.com/T/#u
>>>>>> Reviewed-by: Ryan Roberts <ryan.roberts@arm.com>
>>>>>> Acked-by: Barry Song <baohua@kernel.org>
>>>>>> Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
>>>>>> ---
>>>>>> mm/rmap.c | 7 ++++---
>>>>>> 1 file changed, 4 insertions(+), 3 deletions(-)
>>>>>>
>>>>>> diff --git a/mm/rmap.c b/mm/rmap.c
>>>>>> index 985ab0b085ba..e1d16003c514 100644
>>>>>> --- a/mm/rmap.c
>>>>>> +++ b/mm/rmap.c
>>>>>> @@ -1863,9 +1863,10 @@ static inline unsigned int folio_unmap_pte_batch(struct folio *folio,
>>>>>> end_addr = pmd_addr_end(addr, vma->vm_end);
>>>>>> max_nr = (end_addr - addr) >> PAGE_SHIFT;
>>>>>>
>>>>>> - /* We only support lazyfree batching for now ... */
>>>>>> - if (!folio_test_anon(folio) || folio_test_swapbacked(folio))
>>>>>> + /* We only support lazyfree or file folios batching for now ... */
>>>>>> + if (folio_test_anon(folio) && folio_test_swapbacked(folio))
>>>>>> return 1;
>>>>>> +
>>>>>> if (pte_unused(pte))
>>>>>> return 1;
>>>>>>
>>>>>> @@ -2231,7 +2232,7 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
>>>>>> *
>>>>>> * See Documentation/mm/mmu_notifier.rst
>>>>>> */
>>>>>> - dec_mm_counter(mm, mm_counter_file(folio));
>>>>>> + add_mm_counter(mm, mm_counter_file(folio), -nr_pages);
>>>>>> }
>>>>>> discard:
>>>>>> if (unlikely(folio_test_hugetlb(folio))) {
>>>>>> --
>>>>>> 2.47.3
>>>>>>
>>>>> Hi, Baolin
>>>>>
>>>>> When reading your patch, I come up one small question.
>>>>>
>>>>> Current try_to_unmap_one() has following structure:
>>>>>
>>>>> try_to_unmap_one()
>>>>> while (page_vma_mapped_walk(&pvmw)) {
>>>>> nr_pages = folio_unmap_pte_batch()
>>>>>
>>>>> if (nr_pages = folio_nr_pages(folio))
>>>>> goto walk_done;
>>>>> }
>>>>>
>>>>> I am thinking what if nr_pages > 1 but nr_pages != folio_nr_pages().
>>>>>
>>>>> If my understanding is correct, page_vma_mapped_walk() would start from
>>>>> (pvmw->address + PAGE_SIZE) in next iteration, but we have already cleared to
>>>>> (pvmw->address + nr_pages * PAGE_SIZE), right?
>>>>>
>>>>> Not sure my understanding is correct, if so do we have some reason not to
>>>>> skip the cleared range?
>>>> I don’t quite understand your question. For nr_pages > 1 but not equal
>>>> to nr_pages, page_vma_mapped_walk will skip the nr_pages - 1 PTEs inside.
>>>>
>>>> take a look:
>>>>
>>>> next_pte:
>>>> do {
>>>> pvmw->address += PAGE_SIZE;
>>>> if (pvmw->address >= end)
>>>> return not_found(pvmw);
>>>> /* Did we cross page table boundary? */
>>>> if ((pvmw->address & (PMD_SIZE - PAGE_SIZE)) == 0) {
>>>> if (pvmw->ptl) {
>>>> spin_unlock(pvmw->ptl);
>>>> pvmw->ptl = NULL;
>>>> }
>>>> pte_unmap(pvmw->pte);
>>>> pvmw->pte = NULL;
>>>> pvmw->flags |= PVMW_PGTABLE_CROSSED;
>>>> goto restart;
>>>> }
>>>> pvmw->pte++;
>>>> } while (pte_none(ptep_get(pvmw->pte)));
>>>>
>>> Yes, we do it in page_vma_mapped_walk() now. Since they are pte_none(), they
>>> will be skipped.
>>>
>>> I mean maybe we can skip it in try_to_unmap_one(), for example:
>>>
>>> diff --git a/mm/rmap.c b/mm/rmap.c
>>> index 9e5bd4834481..ea1afec7c802 100644
>>> --- a/mm/rmap.c
>>> +++ b/mm/rmap.c
>>> @@ -2250,6 +2250,10 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
>>> */
>>> if (nr_pages == folio_nr_pages(folio))
>>> goto walk_done;
>>> + else {
>>> + pvmw.address += PAGE_SIZE * (nr_pages - 1);
>>> + pvmw.pte += nr_pages - 1;
>>> + }
>>> continue;
>>> walk_abort:
>>> ret = false;
>>
>> I am of the opinion that we should do something like this. In the internal pvmw code,
>
> I am still not convinced that skipping PTEs in try_to_unmap_one()
> is the right place. If we really want to skip certain PTEs early,
> should we instead hint page_vma_mapped_walk()? That said, I don't
> see much value in doing so, since in most cases nr is either 1 or
> folio_nr_pages(folio).
>
>> we keep skipping ptes till the ptes are none. With my proposed uffd-fix [1], if the old
>> ptes were uffd-wp armed, pte_install_uffd_wp_if_needed will convert all ptes from none
>> to not none, and we will lose the batching effect. I also plan to extend support to
>> anonymous folios (therefore generalizing for all types of memory) which will set a
>> batch of ptes as swap, and the internal pvmw code won't be able to skip through the
>> batch.
>
> Thanks for catching this, Dev. I already filter out some of the more
> complex cases, for example:
> if (pte_unused(pte))
> return 1;
Hi Dev, thanks for the report[1], and you also explained why mm-selftets
can pass.
[1]
https://lore.kernel.org/linux-mm/20260116082721.275178-1-dev.jain@arm.com/
> Since the userfaultfd write-protection case is also a corner case,
> could we filter it out as well?
>
> diff --git a/mm/rmap.c b/mm/rmap.c
> index c86f1135222b..6bb8ba6f046e 100644
> --- a/mm/rmap.c
> +++ b/mm/rmap.c
> @@ -1870,6 +1870,9 @@ static inline unsigned int
> folio_unmap_pte_batch(struct folio *folio,
> if (pte_unused(pte))
> return 1;
>
> + if (userfaultfd_wp(vma))
> + return 1;
> +
> return folio_pte_batch(folio, pvmw->pte, pte, max_nr);
> }
That small fix makes sense to me. I think Dev can continue to support
the UFFD batch optimization, and we need more review and testing for the
UFFD batched operations, as David suggested[2].
[2]
https://lore.kernel.org/all/9edeeef1-5553-406b-8e56-30b11809eec5@kernel.org/
next prev parent reply other threads:[~2026-01-16 15:49 UTC|newest]
Thread overview: 52+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-12-26 6:07 [PATCH v5 0/5] support batch checking of references and unmapping for " Baolin Wang
2025-12-26 6:07 ` [PATCH v5 1/5] mm: rmap: support batched checks of the references " Baolin Wang
2026-01-07 6:01 ` Harry Yoo
2026-02-09 8:49 ` David Hildenbrand (Arm)
2026-02-09 9:14 ` Baolin Wang
2026-02-09 9:20 ` David Hildenbrand (Arm)
2026-02-09 9:25 ` Baolin Wang
2025-12-26 6:07 ` [PATCH v5 2/5] arm64: mm: factor out the address and ptep alignment into a new helper Baolin Wang
2026-02-09 8:50 ` David Hildenbrand (Arm)
2025-12-26 6:07 ` [PATCH v5 3/5] arm64: mm: support batch clearing of the young flag for large folios Baolin Wang
2026-01-02 12:21 ` Ryan Roberts
2026-02-09 9:02 ` David Hildenbrand (Arm)
2025-12-26 6:07 ` [PATCH v5 4/5] arm64: mm: implement the architecture-specific clear_flush_young_ptes() Baolin Wang
2026-01-28 11:47 ` Chris Mason
2026-01-29 1:42 ` Baolin Wang
2026-02-09 9:09 ` David Hildenbrand (Arm)
2026-02-09 9:36 ` Baolin Wang
2026-02-09 9:55 ` David Hildenbrand (Arm)
2026-02-09 10:13 ` Baolin Wang
2026-02-16 0:24 ` Alistair Popple
2025-12-26 6:07 ` [PATCH v5 5/5] mm: rmap: support batched unmapping for file large folios Baolin Wang
2026-01-06 13:22 ` Wei Yang
2026-01-06 21:29 ` Barry Song
2026-01-07 1:46 ` Wei Yang
2026-01-07 2:21 ` Barry Song
2026-01-07 2:29 ` Baolin Wang
2026-01-07 3:31 ` Wei Yang
2026-01-16 9:53 ` Dev Jain
2026-01-16 11:14 ` Lorenzo Stoakes
2026-01-16 14:28 ` Barry Song
2026-01-16 15:23 ` Barry Song
2026-01-16 15:49 ` Baolin Wang [this message]
2026-01-18 5:46 ` Dev Jain
2026-01-19 5:50 ` Baolin Wang
2026-01-19 6:36 ` Dev Jain
2026-01-19 7:22 ` Baolin Wang
2026-01-16 15:14 ` Barry Song
2026-01-18 5:48 ` Dev Jain
2026-01-07 6:54 ` Harry Yoo
2026-01-16 8:42 ` Lorenzo Stoakes
2026-01-16 16:26 ` [PATCH] mm: rmap: skip batched unmapping for UFFD vmas Baolin Wang
2026-02-09 9:54 ` David Hildenbrand (Arm)
2026-02-09 10:49 ` Barry Song
2026-02-09 10:58 ` David Hildenbrand (Arm)
2026-02-10 12:01 ` Dev Jain
2026-02-09 9:38 ` [PATCH v5 5/5] mm: rmap: support batched unmapping for file large folios David Hildenbrand (Arm)
2026-02-09 9:43 ` Baolin Wang
2026-02-13 5:19 ` Barry Song
2026-02-18 12:26 ` Dev Jain
2026-01-16 8:41 ` [PATCH v5 0/5] support batch checking of references and unmapping for " Lorenzo Stoakes
2026-01-16 10:53 ` David Hildenbrand (Red Hat)
2026-01-16 10:52 ` David Hildenbrand (Red Hat)
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=84214d4c-2898-462c-ad27-fee39502128a@linux.alibaba.com \
--to=baolin.wang@linux.alibaba.com \
--cc=21cnbao@gmail.com \
--cc=Liam.Howlett@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=catalin.marinas@arm.com \
--cc=david@kernel.org \
--cc=dev.jain@arm.com \
--cc=harry.yoo@oracle.com \
--cc=jannh@google.com \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lorenzo.stoakes@oracle.com \
--cc=mhocko@suse.com \
--cc=richard.weiyang@gmail.com \
--cc=riel@surriel.com \
--cc=rppt@kernel.org \
--cc=ryan.roberts@arm.com \
--cc=surenb@google.com \
--cc=vbabka@suse.cz \
--cc=will@kernel.org \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox