Re: [PATCH v5 5/5] mm: rmap: support batched unmapping for file large folios

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Barry Song <21cnbao@gmail.com>
To: Dev Jain <dev.jain@arm.com>
Cc: Wei Yang <richard.weiyang@gmail.com>,
	Baolin Wang <baolin.wang@linux.alibaba.com>,
	 akpm@linux-foundation.org, david@kernel.org,
	catalin.marinas@arm.com,  will@kernel.org,
	lorenzo.stoakes@oracle.com, ryan.roberts@arm.com,
	 Liam.Howlett@oracle.com, vbabka@suse.cz, rppt@kernel.org,
	surenb@google.com,  mhocko@suse.com, riel@surriel.com,
	harry.yoo@oracle.com, jannh@google.com,  willy@infradead.org,
	linux-mm@kvack.org, linux-arm-kernel@lists.infradead.org,
	 linux-kernel@vger.kernel.org
Subject: Re: [PATCH v5 5/5] mm: rmap: support batched unmapping for file large folios
Date: Fri, 16 Jan 2026 22:28:35 +0800	[thread overview]
Message-ID: <CAGsJ_4yo-EE__1VzHqxE8ehDOFQuxqP9GXcskkeZ-WTTyz12_A@mail.gmail.com> (raw)
In-Reply-To: <db4d0a19-598b-48ff-accc-f5940a481035@arm.com>

On Fri, Jan 16, 2026 at 5:53 PM Dev Jain <dev.jain@arm.com> wrote:
>
>
> On 07/01/26 7:16 am, Wei Yang wrote:
> > On Wed, Jan 07, 2026 at 10:29:25AM +1300, Barry Song wrote:
> >> On Wed, Jan 7, 2026 at 2:22 AM Wei Yang <richard.weiyang@gmail.com> wrote:
> >>> On Fri, Dec 26, 2025 at 02:07:59PM +0800, Baolin Wang wrote:
> >>>> Similar to folio_referenced_one(), we can apply batched unmapping for file
> >>>> large folios to optimize the performance of file folios reclamation.
> >>>>
> >>>> Barry previously implemented batched unmapping for lazyfree anonymous large
> >>>> folios[1] and did not further optimize anonymous large folios or file-backed
> >>>> large folios at that stage. As for file-backed large folios, the batched
> >>>> unmapping support is relatively straightforward, as we only need to clear
> >>>> the consecutive (present) PTE entries for file-backed large folios.
> >>>>
> >>>> Performance testing:
> >>>> Allocate 10G clean file-backed folios by mmap() in a memory cgroup, and try to
> >>>> reclaim 8G file-backed folios via the memory.reclaim interface. I can observe
> >>>> 75% performance improvement on my Arm64 32-core server (and 50%+ improvement
> >>>> on my X86 machine) with this patch.
> >>>>
> >>>> W/o patch:
> >>>> real    0m1.018s
> >>>> user    0m0.000s
> >>>> sys     0m1.018s
> >>>>
> >>>> W/ patch:
> >>>> real   0m0.249s
> >>>> user   0m0.000s
> >>>> sys    0m0.249s
> >>>>
> >>>> [1] https://lore.kernel.org/all/20250214093015.51024-4-21cnbao@gmail.com/T/#u
> >>>> Reviewed-by: Ryan Roberts <ryan.roberts@arm.com>
> >>>> Acked-by: Barry Song <baohua@kernel.org>
> >>>> Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
> >>>> ---
> >>>> mm/rmap.c | 7 ++++---
> >>>> 1 file changed, 4 insertions(+), 3 deletions(-)
> >>>>
> >>>> diff --git a/mm/rmap.c b/mm/rmap.c
> >>>> index 985ab0b085ba..e1d16003c514 100644
> >>>> --- a/mm/rmap.c
> >>>> +++ b/mm/rmap.c
> >>>> @@ -1863,9 +1863,10 @@ static inline unsigned int folio_unmap_pte_batch(struct folio *folio,
> >>>>       end_addr = pmd_addr_end(addr, vma->vm_end);
> >>>>       max_nr = (end_addr - addr) >> PAGE_SHIFT;
> >>>>
> >>>> -      /* We only support lazyfree batching for now ... */
> >>>> -      if (!folio_test_anon(folio) || folio_test_swapbacked(folio))
> >>>> +      /* We only support lazyfree or file folios batching for now ... */
> >>>> +      if (folio_test_anon(folio) && folio_test_swapbacked(folio))
> >>>>               return 1;
> >>>> +
> >>>>       if (pte_unused(pte))
> >>>>               return 1;
> >>>>
> >>>> @@ -2231,7 +2232,7 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
> >>>>                        *
> >>>>                        * See Documentation/mm/mmu_notifier.rst
> >>>>                        */
> >>>> -                      dec_mm_counter(mm, mm_counter_file(folio));
> >>>> +                      add_mm_counter(mm, mm_counter_file(folio), -nr_pages);
> >>>>               }
> >>>> discard:
> >>>>               if (unlikely(folio_test_hugetlb(folio))) {
> >>>> --
> >>>> 2.47.3
> >>>>
> >>> Hi, Baolin
> >>>
> >>> When reading your patch, I come up one small question.
> >>>
> >>> Current try_to_unmap_one() has following structure:
> >>>
> >>>     try_to_unmap_one()
> >>>         while (page_vma_mapped_walk(&pvmw)) {
> >>>             nr_pages = folio_unmap_pte_batch()
> >>>
> >>>             if (nr_pages = folio_nr_pages(folio))
> >>>                 goto walk_done;
> >>>         }
> >>>
> >>> I am thinking what if nr_pages > 1 but nr_pages != folio_nr_pages().
> >>>
> >>> If my understanding is correct, page_vma_mapped_walk() would start from
> >>> (pvmw->address + PAGE_SIZE) in next iteration, but we have already cleared to
> >>> (pvmw->address + nr_pages * PAGE_SIZE), right?
> >>>
> >>> Not sure my understanding is correct, if so do we have some reason not to
> >>> skip the cleared range?
> >> I don’t quite understand your question. For nr_pages > 1 but not equal
> >> to nr_pages, page_vma_mapped_walk will skip the nr_pages - 1 PTEs inside.
> >>
> >> take a look:
> >>
> >> next_pte:
> >>                do {
> >>                        pvmw->address += PAGE_SIZE;
> >>                        if (pvmw->address >= end)
> >>                                return not_found(pvmw);
> >>                        /* Did we cross page table boundary? */
> >>                        if ((pvmw->address & (PMD_SIZE - PAGE_SIZE)) == 0) {
> >>                                if (pvmw->ptl) {
> >>                                        spin_unlock(pvmw->ptl);
> >>                                        pvmw->ptl = NULL;
> >>                                }
> >>                                pte_unmap(pvmw->pte);
> >>                                pvmw->pte = NULL;
> >>                                pvmw->flags |= PVMW_PGTABLE_CROSSED;
> >>                                goto restart;
> >>                        }
> >>                        pvmw->pte++;
> >>                } while (pte_none(ptep_get(pvmw->pte)));
> >>
> > Yes, we do it in page_vma_mapped_walk() now. Since they are pte_none(), they
> > will be skipped.
> >
> > I mean maybe we can skip it in try_to_unmap_one(), for example:
> >
> > diff --git a/mm/rmap.c b/mm/rmap.c
> > index 9e5bd4834481..ea1afec7c802 100644
> > --- a/mm/rmap.c
> > +++ b/mm/rmap.c
> > @@ -2250,6 +2250,10 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
> >                */
> >               if (nr_pages == folio_nr_pages(folio))
> >                       goto walk_done;
> > +             else {
> > +                     pvmw.address += PAGE_SIZE * (nr_pages - 1);
> > +                     pvmw.pte += nr_pages - 1;
> > +             }
> >               continue;
> >  walk_abort:
> >               ret = false;
>
> I am of the opinion that we should do something like this. In the internal pvmw code,

I am still not convinced that skipping PTEs in try_to_unmap_one()
is the right place. If we really want to skip certain PTEs early,
should we instead hint page_vma_mapped_walk()? That said, I don't
see much value in doing so, since in most cases nr is either 1 or
folio_nr_pages(folio).

> we keep skipping ptes till the ptes are none. With my proposed uffd-fix [1], if the old
> ptes were uffd-wp armed, pte_install_uffd_wp_if_needed will convert all ptes from none
> to not none, and we will lose the batching effect. I also plan to extend support to
> anonymous folios (therefore generalizing for all types of memory) which will set a
> batch of ptes as swap, and the internal pvmw code won't be able to skip through the
> batch.

Thanks for catching this, Dev. I already filter out some of the more
complex cases, for example:
if (pte_unused(pte))
        return 1;

Since the userfaultfd write-protection case is also a corner case,
could we filter it out as well?

diff --git a/mm/rmap.c b/mm/rmap.c
index c86f1135222b..6bb8ba6f046e 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1870,6 +1870,9 @@ static inline unsigned int
folio_unmap_pte_batch(struct folio *folio,
        if (pte_unused(pte))
                return 1;

+       if (userfaultfd_wp(vma))
+               return 1;
+
        return folio_pte_batch(folio, pvmw->pte, pte, max_nr);
}

Just offering a second option — yours is probably better.

Thanks
Barry

next prev parent reply	other threads:[~2026-01-16 14:28 UTC|newest]

Thread overview: 52+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-12-26  6:07 [PATCH v5 0/5] support batch checking of references and unmapping for " Baolin Wang
2025-12-26  6:07 ` [PATCH v5 1/5] mm: rmap: support batched checks of the references " Baolin Wang
2026-01-07  6:01   ` Harry Yoo
2026-02-09  8:49   ` David Hildenbrand (Arm)
2026-02-09  9:14     ` Baolin Wang
2026-02-09  9:20       ` David Hildenbrand (Arm)
2026-02-09  9:25         ` Baolin Wang
2025-12-26  6:07 ` [PATCH v5 2/5] arm64: mm: factor out the address and ptep alignment into a new helper Baolin Wang
2026-02-09  8:50   ` David Hildenbrand (Arm)
2025-12-26  6:07 ` [PATCH v5 3/5] arm64: mm: support batch clearing of the young flag for large folios Baolin Wang
2026-01-02 12:21   ` Ryan Roberts
2026-02-09  9:02   ` David Hildenbrand (Arm)
2025-12-26  6:07 ` [PATCH v5 4/5] arm64: mm: implement the architecture-specific clear_flush_young_ptes() Baolin Wang
2026-01-28 11:47   ` Chris Mason
2026-01-29  1:42     ` Baolin Wang
2026-02-09  9:09       ` David Hildenbrand (Arm)
2026-02-09  9:36         ` Baolin Wang
2026-02-09  9:55           ` David Hildenbrand (Arm)
2026-02-09 10:13             ` Baolin Wang
2026-02-16  0:24               ` Alistair Popple
2025-12-26  6:07 ` [PATCH v5 5/5] mm: rmap: support batched unmapping for file large folios Baolin Wang
2026-01-06 13:22   ` Wei Yang
2026-01-06 21:29     ` Barry Song
2026-01-07  1:46       ` Wei Yang
2026-01-07  2:21         ` Barry Song
2026-01-07  2:29           ` Baolin Wang
2026-01-07  3:31             ` Wei Yang
2026-01-16  9:53         ` Dev Jain
2026-01-16 11:14           ` Lorenzo Stoakes
2026-01-16 14:28           ` Barry Song [this message]
2026-01-16 15:23             ` Barry Song
2026-01-16 15:49             ` Baolin Wang
2026-01-18  5:46             ` Dev Jain
2026-01-19  5:50               ` Baolin Wang
2026-01-19  6:36                 ` Dev Jain
2026-01-19  7:22                   ` Baolin Wang
2026-01-16 15:14           ` Barry Song
2026-01-18  5:48             ` Dev Jain
2026-01-07  6:54   ` Harry Yoo
2026-01-16  8:42   ` Lorenzo Stoakes
2026-01-16 16:26   ` [PATCH] mm: rmap: skip batched unmapping for UFFD vmas Baolin Wang
2026-02-09  9:54     ` David Hildenbrand (Arm)
2026-02-09 10:49       ` Barry Song
2026-02-09 10:58         ` David Hildenbrand (Arm)
2026-02-10 12:01         ` Dev Jain
2026-02-09  9:38   ` [PATCH v5 5/5] mm: rmap: support batched unmapping for file large folios David Hildenbrand (Arm)
2026-02-09  9:43     ` Baolin Wang
2026-02-13  5:19       ` Barry Song
2026-02-18 12:26         ` Dev Jain
2026-01-16  8:41 ` [PATCH v5 0/5] support batch checking of references and unmapping for " Lorenzo Stoakes
2026-01-16 10:53   ` David Hildenbrand (Red Hat)
2026-01-16 10:52 ` David Hildenbrand (Red Hat)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAGsJ_4yo-EE__1VzHqxE8ehDOFQuxqP9GXcskkeZ-WTTyz12_A@mail.gmail.com \
    --to=21cnbao@gmail.com \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=baolin.wang@linux.alibaba.com \
    --cc=catalin.marinas@arm.com \
    --cc=david@kernel.org \
    --cc=dev.jain@arm.com \
    --cc=harry.yoo@oracle.com \
    --cc=jannh@google.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=mhocko@suse.com \
    --cc=richard.weiyang@gmail.com \
    --cc=riel@surriel.com \
    --cc=rppt@kernel.org \
    --cc=ryan.roberts@arm.com \
    --cc=surenb@google.com \
    --cc=vbabka@suse.cz \
    --cc=will@kernel.org \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox