Re: [RFC V3 0/9] Accelerate page migration with batch copying and hardware offload

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: "Huang, Ying" <ying.huang@linux.alibaba.com>
To: Zi Yan <ziy@nvidia.com>
Cc: Shivank Garg <shivankg@amd.com>,
	 akpm@linux-foundation.org, david@redhat.com,
	 willy@infradead.org,  matthew.brost@intel.com,
	joshua.hahnjy@gmail.com,  rakie.kim@sk.com,  byungchul@sk.com,
	gourry@gourry.net,  apopple@nvidia.com,
	 lorenzo.stoakes@oracle.com, Liam.Howlett@oracle.com,
	 vbabka@suse.cz,  rppt@kernel.org, surenb@google.com,
	 mhocko@suse.com,  vkoul@kernel.org, lucas.demarchi@intel.com,
	 rdunlap@infradead.org,  jgg@ziepe.ca, kuba@kernel.org,
	 justonli@chromium.org,  ivecera@redhat.com,
	dave.jiang@intel.com,  Jonathan.Cameron@huawei.com,
	dan.j.williams@intel.com,  rientjes@google.com,
	Raghavendra.KodsaraThimmappa@amd.com,  bharata@amd.com,
	alirad.malek@zptcorp.com,  yiannis@zptcorp.com,
	 weixugc@google.com, linux-kernel@vger.kernel.org,
	 linux-mm@kvack.org
Subject: Re: [RFC V3 0/9] Accelerate page migration with batch copying and hardware offload
Date: Wed, 24 Sep 2025 11:11:36 +0800	[thread overview]
Message-ID: <87tt0sfst3.fsf@DESKTOP-5N7EMDA> (raw)
In-Reply-To: <C8E561B3-B9DB-4F58-A2C7-4EE17E08A993@nvidia.com> (Zi Yan's message of "Tue, 23 Sep 2025 22:03:18 -0400")

Zi Yan <ziy@nvidia.com> writes:

> On 23 Sep 2025, at 21:49, Huang, Ying wrote:
>
>> Hi, Shivank,
>>
>> Thanks for working on this!
>>
>> Shivank Garg <shivankg@amd.com> writes:
>>
>>> This is the third RFC of the patchset to enhance page migration by batching
>>> folio-copy operations and enabling acceleration via multi-threaded CPU or
>>> DMA offload.
>>>
>>> Single-threaded, folio-by-folio copying bottlenecks page migration
>>> in modern systems with deep memory hierarchies, especially for large
>>> folios where copy overhead dominates, leaving significant hardware
>>> potential untapped.
>>>
>>> By batching the copy phase, we create an opportunity for significant
>>> hardware acceleration. This series builds a framework for this acceleration
>>> and provides two initial offload driver implementations: one using multiple
>>> CPU threads (mtcopy) and another leveraging the DMAEngine subsystem (dcbm).
>>>
>>> This version incorporates significant feedback to improve correctness,
>>> robustness, and the efficiency of the DMA offload path.
>>>
>>> Changelog since V2:
>>>
>>> 1. DMA Engine Rewrite:
>>>    - Switched from per-folio dma_map_page() to batch dma_map_sgtable()
>>>    - Single completion interrupt per batch (reduced overhead)
>>>    - Order of magnitude improvement in setup time for large batches
>>> 2. Code cleanups and refactoring
>>> 3. Rebased on latest mainline (6.17-rc6+)
>>>
>>> MOTIVATION:
>>> -----------
>>>
>>> Current Migration Flow:
>>> [ move_pages(), Compaction, Tiering, etc. ]
>>>               |
>>>               v
>>>      [ migrate_pages() ] // Common entry point
>>>               |
>>>               v
>>>     [ migrate_pages_batch() ] // NR_MAX_BATCHED_MIGRATION (512) folios at a time
>>>       |
>>>       |--> [ migrate_folio_unmap() ]
>>>       |
>>>       |--> [ try_to_unmap_flush() ] // Perform a single, batched TLB flush
>>>       |
>>>       |--> [ migrate_folios_move() ] // Bottleneck: Interleaved copy
>>>            - For each folio:
>>>              - Metadata prep: Copy flags, mappings, etc.
>>>              - folio_copy()  <-- Single-threaded, serial data copy.
>>>              - Update PTEs & finalize for that single folio.
>>>
>>> Understanding overheads in page migration (move_pages() syscall):
>>>
>>> Total move_pages() overheads = folio_copy() + Other overheads
>>> 1. folio_copy() is the core copy operation that interests us.
>>> 2. The remaining operations are user/kernel transitions, page table walks,
>>> locking, folio unmap, dst folio alloc, TLB flush, copying flags, updating
>>> mappings and PTEs etc. that contribute to the remaining overheads.
>>>
>>> Percentage of folio_copy() overheads in move_pages(N pages) syscall time:
>>> Number of pages being migrated and folio size:
>>>             4KB     2MB
>>> 1 page     <1%     ~66%
>>> 512 page   ~35%    ~97%
>>>
>>> Based on Amdahl's Law, optimizing folio_copy() for large pages offers a
>>> substantial performance opportunity.
>>>
>>> move_pages() syscall speedup = 1 / ((1 - F) + (F / S))
>>> Where F is the fraction of time spent in folio_copy() and S is the speedup of
>>> folio_copy().
>>>
>>> For 4KB folios, folio copy overheads are significantly small in single-page
>>> migrations to impact overall speedup, even for 512 pages, maximum theoretical
>>> speedup is limited to ~1.54x with infinite folio_copy() speedup.
>>>
>>> For 2MB THPs, folio copy overheads are significant even in single page
>>> migrations, with a theoretical speedup of ~3x with infinite folio_copy()
>>> speedup and up to ~33x for 512 pages.
>>>
>>> A realistic value of S (speedup of folio_copy()) is 7.5x for DMA offload
>>> based on my measurements for copying 512 2MB pages.
>>> This gives move_pages(), a practical speedup of 6.3x for 512 2MB page (also
>>> observed in the experiments below).
>>>
>>> DESIGN: A Pluggable Migrator Framework
>>> ---------------------------------------
>>>
>>> Introduce migrate_folios_batch_move():
>>>
>>> [ migrate_pages_batch() ]
>>>     |
>>>     |--> migrate_folio_unmap()
>>>     |
>>>     |--> try_to_unmap_flush()
>>>     |
>>>     +--> [ migrate_folios_batch_move() ] // new batched design
>>>             |
>>>             |--> Metadata migration
>>>             |    - Metadata prep: Copy flags, mappings, etc.
>>>             |    - Use MIGRATE_NO_COPY to skip the actual data copy.
>>>             |
>>>             |--> Batch copy folio data
>>>             |    - Migrator is configurable at runtime via sysfs.
>>>             |
>>>             |          static_call(_folios_copy) // Pluggable migrators
>>>             |          /          |            \
>>>             |         v           v             v
>>>             | [ Default ]  [ MT CPU copy ]  [ DMA Offload ]
>>>             |
>>>             +--> Update PTEs to point to dst folios and complete migration.
>>>
>>
>> I just jump in the discussion, so this may be discussed before already.
>> Sorry if so.  Why not
>>
>> migrate_folios_unmap()
>> try_to_unmap_flush()
>> copy folios in parallel if possible
>> migrate_folios_move(): with MIGRATE_NO_COPY?
>
> Since in move_to_new_folio(), there are various migration preparation
> works, which can fail. Copying folios regardless might lead to some
> unnecessary work. What is your take on this?

Good point, we should skip copying folios that fails the checks.

>>
>>> User Control of Migrator:
>>>
>>> # echo 1 > /sys/kernel/dcbm/offloading
>>>    |
>>>    +--> Driver's sysfs handler
>>>         |
>>>         +--> calls start_offloading(&cpu_migrator)
>>>               |
>>>               +--> calls offc_update_migrator()
>>>                     |
>>>                     +--> static_call_update(_folios_copy, mig->migrate_offc)
>>>
>>> Later, During Migration ...
>>> migrate_folios_batch_move()
>>>     |
>>>     +--> static_call(_folios_copy) // Now dispatches to the selected migrator
>>>           |
>>>           +-> [ mtcopy | dcbm | kernel_default ]
>>>
>>
>> [snip]

---
Best Regards,
Huang, Ying

next prev parent reply	other threads:[~2025-09-24  3:12 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-09-23 17:47 Shivank Garg
2025-09-23 17:47 ` [RFC V3 1/9] mm/migrate: factor out code in move_to_new_folio() and migrate_folio_move() Shivank Garg
2025-10-02 10:30   ` Jonathan Cameron
2025-09-23 17:47 ` [RFC V3 2/9] mm/migrate: revive MIGRATE_NO_COPY in migrate_mode Shivank Garg
2025-09-23 17:47 ` [RFC V3 3/9] mm: Introduce folios_mc_copy() for batch copying folios Shivank Garg
2025-09-23 17:47 ` [RFC V3 4/9] mm/migrate: add migrate_folios_batch_move to batch the folio move operations Shivank Garg
2025-10-02 11:03   ` Jonathan Cameron
2025-10-16  9:17     ` Garg, Shivank
2025-09-23 17:47 ` [RFC V3 5/9] mm: add support for copy offload for folio Migration Shivank Garg
2025-10-02 11:10   ` Jonathan Cameron
2025-10-16  9:40     ` Garg, Shivank
2025-09-23 17:47 ` [RFC V3 6/9] mtcopy: introduce multi-threaded page copy routine Shivank Garg
2025-10-02 11:29   ` Jonathan Cameron
2025-10-20  8:28   ` Byungchul Park
2025-11-06  6:27     ` Garg, Shivank
2025-11-12  2:12       ` Byungchul Park
2025-09-23 17:47 ` [RFC V3 7/9] dcbm: add dma core batch migrator for batch page offloading Shivank Garg
2025-10-02 11:38   ` Jonathan Cameron
2025-10-16  9:59     ` Garg, Shivank
2025-09-23 17:47 ` [RFC V3 8/9] adjust NR_MAX_BATCHED_MIGRATION for testing Shivank Garg
2025-09-23 17:47 ` [RFC V3 9/9] mtcopy: spread threads across die " Shivank Garg
2025-09-24  1:49 ` [RFC V3 0/9] Accelerate page migration with batch copying and hardware offload Huang, Ying
2025-09-24  2:03   ` Zi Yan
2025-09-24  3:11     ` Huang, Ying [this message]
2025-09-24  3:22 ` Zi Yan
2025-10-02 17:10   ` Garg, Shivank

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87tt0sfst3.fsf@DESKTOP-5N7EMDA \
    --to=ying.huang@linux.alibaba.com \
    --cc=Jonathan.Cameron@huawei.com \
    --cc=Liam.Howlett@oracle.com \
    --cc=Raghavendra.KodsaraThimmappa@amd.com \
    --cc=akpm@linux-foundation.org \
    --cc=alirad.malek@zptcorp.com \
    --cc=apopple@nvidia.com \
    --cc=bharata@amd.com \
    --cc=byungchul@sk.com \
    --cc=dan.j.williams@intel.com \
    --cc=dave.jiang@intel.com \
    --cc=david@redhat.com \
    --cc=gourry@gourry.net \
    --cc=ivecera@redhat.com \
    --cc=jgg@ziepe.ca \
    --cc=joshua.hahnjy@gmail.com \
    --cc=justonli@chromium.org \
    --cc=kuba@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=lucas.demarchi@intel.com \
    --cc=matthew.brost@intel.com \
    --cc=mhocko@suse.com \
    --cc=rakie.kim@sk.com \
    --cc=rdunlap@infradead.org \
    --cc=rientjes@google.com \
    --cc=rppt@kernel.org \
    --cc=shivankg@amd.com \
    --cc=surenb@google.com \
    --cc=vbabka@suse.cz \
    --cc=vkoul@kernel.org \
    --cc=weixugc@google.com \
    --cc=willy@infradead.org \
    --cc=yiannis@zptcorp.com \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox