linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Baolin Wang <baolin.wang@linux.alibaba.com>
To: Barry Song <21cnbao@gmail.com>
Cc: akpm@linux-foundation.org, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org, Barry Song <v-songbaohua@oppo.com>,
	David Hildenbrand <david@redhat.com>,
	Ryan Roberts <ryan.roberts@arm.com>,
	Lorenzo Stoakes <lorenzo.stoakes@oracle.com>,
	"Liam R . Howlett" <Liam.Howlett@oracle.com>,
	Vlastimil Babka <vbabka@suse.cz>, Mike Rapoport <rppt@kernel.org>,
	Suren Baghdasaryan <surenb@google.com>,
	Michal Hocko <mhocko@suse.com>, Rik van Riel <riel@surriel.com>,
	Harry Yoo <harry.yoo@oracle.com>,
	Kairui Song <kasong@tencent.com>, Chris Li <chrisl@kernel.org>,
	Baoquan He <bhe@redhat.com>,
	Dan Schatzberg <schatzberg.dan@gmail.com>,
	Kaixiong Yu <yukaixiong@huawei.com>, Fan Ni <fan.ni@samsung.com>,
	Tangquan Zheng <zhengtangquan@oppo.com>
Subject: Re: [PATCH RFC] mm: make try_to_unmap_one support batched unmap for anon large folios
Date: Thu, 15 May 2025 11:40:59 +0800	[thread overview]
Message-ID: <1a985416-c8c5-429f-a83a-3c66be939439@linux.alibaba.com> (raw)
In-Reply-To: <CAGsJ_4yF-evyWCvF=aO_Y2UzpPMAb4zdOe3i5AtR_RBVDbuUZA@mail.gmail.com>



On 2025/5/15 09:35, Barry Song wrote:
> On Wed, May 14, 2025 at 8:11 PM Baolin Wang
> <baolin.wang@linux.alibaba.com> wrote:
>>
>>
>>
>> On 2025/5/13 16:46, Barry Song wrote:
>>> From: Barry Song <v-songbaohua@oppo.com>
>>>
>>> My commit 354dffd29575c ("mm: support batched unmap for lazyfree large
>>> folios during reclamation") introduced support for unmapping entire
>>> lazyfree anonymous large folios at once, instead of one page at a time.
>>> This patch extends that support to generic (non-lazyfree) anonymous
>>> large folios.
>>>
>>> Handling __folio_try_share_anon_rmap() and swap_duplicate() becomes
>>> extremely complex—if not outright impractical—for non-exclusive
>>> anonymous folios. As a result, this patch limits support to exclusive
>>> large folios. Fortunately, most anonymous folios are exclusive in
>>> practice, so this restriction should be acceptable in the majority of
>>> cases.
>>>
>>> SPARC is currently the only architecture that implements
>>> arch_unmap_one(), which also needs to be batched for consistency.
>>> However, this is not yet supported, so the platform is excluded for
>>> now.
>>>
>>> Using the following micro-benchmark to measure the time taken to perform
>>> PAGEOUT on 256MB of 64KiB anonymous large folios.
>>>
>>>    #define _GNU_SOURCE
>>>    #include <stdio.h>
>>>    #include <stdlib.h>
>>>    #include <sys/mman.h>
>>>    #include <string.h>
>>>    #include <time.h>
>>>    #include <unistd.h>
>>>    #include <errno.h>
>>>
>>>    #define SIZE_MB 256
>>>    #define SIZE_BYTES (SIZE_MB * 1024 * 1024)
>>>
>>>    int main() {
>>>        void *addr = mmap(NULL, SIZE_BYTES, PROT_READ | PROT_WRITE,
>>>                          MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
>>>        if (addr == MAP_FAILED) {
>>>            perror("mmap failed");
>>>            return 1;
>>>        }
>>>
>>>        memset(addr, 0, SIZE_BYTES);
>>>
>>>        struct timespec start, end;
>>>        clock_gettime(CLOCK_MONOTONIC, &start);
>>>
>>>        if (madvise(addr, SIZE_BYTES, MADV_PAGEOUT) != 0) {
>>>            perror("madvise(MADV_PAGEOUT) failed");
>>>            munmap(addr, SIZE_BYTES);
>>>            return 1;
>>>        }
>>>
>>>        clock_gettime(CLOCK_MONOTONIC, &end);
>>>
>>>        long duration_ns = (end.tv_sec - start.tv_sec) * 1e9 +
>>>                           (end.tv_nsec - start.tv_nsec);
>>>        printf("madvise(MADV_PAGEOUT) took %ld ns (%.3f ms)\n",
>>>               duration_ns, duration_ns / 1e6);
>>>
>>>        munmap(addr, SIZE_BYTES);
>>>        return 0;
>>>    }
>>>
>>> w/o patch:
>>> ~ # ./a.out
>>> madvise(MADV_PAGEOUT) took 1337334000 ns (1337.334 ms)
>>> ~ # ./a.out
>>> madvise(MADV_PAGEOUT) took 1340471008 ns (1340.471 ms)
>>> ~ # ./a.out
>>> madvise(MADV_PAGEOUT) took 1385718992 ns (1385.719 ms)
>>> ~ # ./a.out
>>> madvise(MADV_PAGEOUT) took 1366070000 ns (1366.070 ms)
>>> ~ # ./a.out
>>> madvise(MADV_PAGEOUT) took 1347834992 ns (1347.835 ms)
>>>
>>> w/patch:
>>> ~ # ./a.out
>>> madvise(MADV_PAGEOUT) took 698178000 ns (698.178 ms)
>>> ~ # ./a.out
>>> madvise(MADV_PAGEOUT) took 708570000 ns (708.570 ms)
>>> ~ # ./a.out
>>> madvise(MADV_PAGEOUT) took 693884000 ns (693.884 ms)
>>> ~ # ./a.out
>>> madvise(MADV_PAGEOUT) took 693366000 ns (693.366 ms)
>>> ~ # ./a.out
>>> madvise(MADV_PAGEOUT) took 690790000 ns (690.790 ms)
>>>
>>> We found that the time to reclaim this memory was reduced by half.
>>
>> Do you have some performance numbers for the base page?
> 
> We verified that folio_test_large(folio) needs to run in a batched context;
> otherwise, nr_pages remains 1 for each folio.
> 
>                          if (folio_test_large(folio) && !(flags &
> TTU_HWPOISON) &&
>                              can_batch_unmap_folio_ptes(address, folio, pvmw.pte,
>                              anon_exclusive))
>                                  nr_pages = folio_nr_pages(folio);
> 
> I didn't expect any noticeable performance change for base pages, but testing
> shows the patch appears to make them slightly faster—likely due to test noise or
> jitter.
> 
> W/o patch:
> 
> ~ # ./a.out
> madvise(MADV_PAGEOUT) took 5686488000 ns (5686.488 ms)
> ~ # ./a.out
> madvise(MADV_PAGEOUT) took 5628330992 ns (5628.331 ms)
> ~ # ./a.out
> madvise(MADV_PAGEOUT) took 5771742992 ns (5771.743 ms)
> ~ # ./a.out
> madvise(MADV_PAGEOUT) took 5672108000 ns (5672.108 ms)
> 
> 
> W/ patch:
> 
> ~ # ./a.out
> madvise(MADV_PAGEOUT) took 5481578000 ns (5481.578 ms)
> ~ # ./a.out
> madvise(MADV_PAGEOUT) took 5425394992 ns (5425.395 ms)
> ~ # ./a.out
> madvise(MADV_PAGEOUT) took 5522109008 ns (5522.109 ms)
> ~ # ./a.out
> madvise(MADV_PAGEOUT) took 5506832000 ns (5506.832 ms)

Thanks. My expectation is also that the batch processing of large folios 
should not have a performance impact on the base pages, but it would be 
best to clearly state this in the commit message.


  reply	other threads:[~2025-05-15  3:41 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-05-13  8:46 Barry Song
2025-05-14  7:56 ` Dev Jain
2025-05-15  1:38   ` Barry Song
2025-05-14  8:11 ` Baolin Wang
2025-05-15  1:35   ` Barry Song
2025-05-15  3:40     ` Baolin Wang [this message]
2025-05-23  8:25 ` Barry Song

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1a985416-c8c5-429f-a83a-3c66be939439@linux.alibaba.com \
    --to=baolin.wang@linux.alibaba.com \
    --cc=21cnbao@gmail.com \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=bhe@redhat.com \
    --cc=chrisl@kernel.org \
    --cc=david@redhat.com \
    --cc=fan.ni@samsung.com \
    --cc=harry.yoo@oracle.com \
    --cc=kasong@tencent.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=mhocko@suse.com \
    --cc=riel@surriel.com \
    --cc=rppt@kernel.org \
    --cc=ryan.roberts@arm.com \
    --cc=schatzberg.dan@gmail.com \
    --cc=surenb@google.com \
    --cc=v-songbaohua@oppo.com \
    --cc=vbabka@suse.cz \
    --cc=yukaixiong@huawei.com \
    --cc=zhengtangquan@oppo.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox