From: Shivank Garg <shivankg@amd.com>
To: Zi Yan <ziy@nvidia.com>, linux-mm@kvack.org
Cc: David Rientjes <rientjes@google.com>,
Aneesh Kumar <AneeshKumar.KizhakeVeetil@arm.com>,
David Hildenbrand <david@redhat.com>,
John Hubbard <jhubbard@nvidia.com>,
Kirill Shutemov <k.shutemov@gmail.com>,
Matthew Wilcox <willy@infradead.org>,
Mel Gorman <mel.gorman@gmail.com>,
"Rao, Bharata Bhasker" <bharata@amd.com>,
Rik van Riel <riel@surriel.com>,
RaghavendraKT <Raghavendra.KodsaraThimmappa@amd.com>,
Wei Xu <weixugc@google.com>, Suyeon Lee <leesuyeon0506@gmail.com>,
Lei Chen <leillc@google.com>,
"Shukla, Santosh" <santosh.shukla@amd.com>,
"Grimm, Jon" <jon.grimm@amd.com>,
sj@kernel.org, shy828301@gmail.com,
Liam Howlett <liam.howlett@oracle.com>,
Gregory Price <gregory.price@memverge.com>,
"Huang, Ying" <ying.huang@linux.alibaba.com>
Subject: Re: [RFC PATCH 3/5] mm/migrate: add migrate_folios_batch_move to batch the folio move operations
Date: Thu, 9 Jan 2025 17:17:50 +0530 [thread overview]
Message-ID: <97ed042a-fe70-46cf-80f1-59e7add66860@amd.com> (raw)
In-Reply-To: <20250103172419.4148674-4-ziy@nvidia.com>
On 1/3/2025 10:54 PM, Zi Yan wrote:
> This is a preparatory patch that enables batch copying for folios
> undergoing migration. By enabling batch copying the folio content, we can
> efficiently utilize the capabilities of DMA hardware or multi-threaded
> folio copy. It also adds MIGRATE_NO_COPY back to migrate_mode, so that
> folio copy will be skipped during metadata copy process and performed
> in a batch later.
>
> Currently, the folio move operation is performed individually for each
> folio in sequential manner:
> for_each_folio() {
> Copy folio metadata like flags and mappings
> Copy the folio content from src to dst
> Update page tables with dst folio
> }
>
> With this patch, we transition to a batch processing approach as shown
> below:
> for_each_folio() {
> Copy folio metadata like flags and mappings
> }
> Batch copy all src folios to dst
> for_each_folio() {
> Update page tables with dst folios
> }
>
> dst->private is used to store page states and possible anon_vma value,
> thus needs to be cleared during metadata copy process. To avoid additional
> memory allocation to store the data during batch copy process, src->private
> is used to store the data after metadata copy process, since src is no
> longer used.
>
> Originally-by: Shivank Garg <shivankg@amd.com>
> Signed-off-by: Zi Yan <ziy@nvidia.com>
> ---
Hi Zi,
Please retain my Signed-off-by for future posting of batch page migration
patchset.
I think we can separate out the MIGRATE_NO_COPY support into separate patch.
Thanks,
Shivank
> include/linux/migrate_mode.h | 2 +
> mm/migrate.c | 207 +++++++++++++++++++++++++++++++++--
> 2 files changed, 201 insertions(+), 8 deletions(-)
>
> diff --git a/include/linux/migrate_mode.h b/include/linux/migrate_mode.h
> index 265c4328b36a..9af6c949a057 100644
> --- a/include/linux/migrate_mode.h
> +++ b/include/linux/migrate_mode.h
> @@ -7,11 +7,13 @@
> * on most operations but not ->writepage as the potential stall time
> * is too significant
> * MIGRATE_SYNC will block when migrating pages
> + * MIGRATE_NO_COPY will not copy page content
> */
> enum migrate_mode {
> MIGRATE_ASYNC,
> MIGRATE_SYNC_LIGHT,
> MIGRATE_SYNC,
> + MIGRATE_NO_COPY,
> };
>
> enum migrate_reason {
> diff --git a/mm/migrate.c b/mm/migrate.c
> index a83508f94c57..95c4cc4a7823 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -51,6 +51,7 @@
>
> #include "internal.h"
>
> +
> bool isolate_movable_page(struct page *page, isolate_mode_t mode)
> {
> struct folio *folio = folio_get_nontail_page(page);
> @@ -752,14 +753,19 @@ static int __migrate_folio(struct address_space *mapping, struct folio *dst,
> enum migrate_mode mode)
> {
> int rc, expected_count = folio_expected_refs(mapping, src);
> + unsigned long dst_private = (unsigned long)dst->private;
>
> /* Check whether src does not have extra refs before we do more work */
> if (folio_ref_count(src) != expected_count)
> return -EAGAIN;
>
> - rc = folio_mc_copy(dst, src);
> - if (unlikely(rc))
> - return rc;
> + if (mode == MIGRATE_NO_COPY)
> + dst->private = NULL;
> + else {
> + rc = folio_mc_copy(dst, src);
> + if (unlikely(rc))
> + return rc;
> + }
>
> rc = __folio_migrate_mapping(mapping, dst, src, expected_count);
> if (rc != MIGRATEPAGE_SUCCESS)
> @@ -769,6 +775,10 @@ static int __migrate_folio(struct address_space *mapping, struct folio *dst,
> folio_attach_private(dst, folio_detach_private(src));
>
> folio_migrate_flags(dst, src);
> +
> + if (mode == MIGRATE_NO_COPY)
> + src->private = (void *)dst_private;
> +
> return MIGRATEPAGE_SUCCESS;
> }
>
> @@ -1042,7 +1052,7 @@ static int _move_to_new_folio_prep(struct folio *dst, struct folio *src,
> mode);
> else
> rc = fallback_migrate_folio(mapping, dst, src, mode);
> - } else {
> + } else if (mode != MIGRATE_NO_COPY) {
> const struct movable_operations *mops;
>
> /*
> @@ -1060,7 +1070,8 @@ static int _move_to_new_folio_prep(struct folio *dst, struct folio *src,
> rc = mops->migrate_page(&dst->page, &src->page, mode);
> WARN_ON_ONCE(rc == MIGRATEPAGE_SUCCESS &&
> !folio_test_isolated(src));
> - }
> + } else
> + rc = -EAGAIN;
> out:
> return rc;
> }
> @@ -1138,7 +1149,7 @@ static void __migrate_folio_record(struct folio *dst,
> dst->private = (void *)anon_vma + old_page_state;
> }
>
> -static void __migrate_folio_extract(struct folio *dst,
> +static void __migrate_folio_read(struct folio *dst,
> int *old_page_state,
> struct anon_vma **anon_vmap)
> {
> @@ -1146,6 +1157,13 @@ static void __migrate_folio_extract(struct folio *dst,
>
> *anon_vmap = (struct anon_vma *)(private & ~PAGE_OLD_STATES);
> *old_page_state = private & PAGE_OLD_STATES;
> +}
> +
> +static void __migrate_folio_extract(struct folio *dst,
> + int *old_page_state,
> + struct anon_vma **anon_vmap)
> +{
> + __migrate_folio_read(dst, old_page_state, anon_vmap);
> dst->private = NULL;
> }
>
> @@ -1771,6 +1789,174 @@ static void migrate_folios_move(struct list_head *src_folios,
> }
> }
>
> +static void migrate_folios_batch_move(struct list_head *src_folios,
> + struct list_head *dst_folios,
> + free_folio_t put_new_folio, unsigned long private,
> + enum migrate_mode mode, int reason,
> + struct list_head *ret_folios,
> + struct migrate_pages_stats *stats,
> + int *retry, int *thp_retry, int *nr_failed,
> + int *nr_retry_pages)
> +{
> + struct folio *folio, *folio2, *dst, *dst2;
> + int rc, nr_pages = 0, nr_mig_folios = 0;
> + int old_page_state = 0;
> + struct anon_vma *anon_vma = NULL;
> + bool is_lru;
> + int is_thp = 0;
> + LIST_HEAD(err_src);
> + LIST_HEAD(err_dst);
> +
> + if (mode != MIGRATE_ASYNC) {
> + *retry += 1;
> + return;
> + }
> +
> + /*
> + * Iterate over the list of locked src/dst folios to copy the metadata
> + */
> + dst = list_first_entry(dst_folios, struct folio, lru);
> + dst2 = list_next_entry(dst, lru);
> + list_for_each_entry_safe(folio, folio2, src_folios, lru) {
> + is_thp = folio_test_large(folio) && folio_test_pmd_mappable(folio);
> + nr_pages = folio_nr_pages(folio);
> + is_lru = !__folio_test_movable(folio);
> +
> + /*
> + * dst->private is not cleared here. It is cleared and moved to
> + * src->private in __migrate_folio().
> + */
> + __migrate_folio_read(dst, &old_page_state, &anon_vma);
> +
> + /*
> + * Use MIGRATE_NO_COPY mode in migrate_folio family functions
> + * to copy the flags, mapping and some other ancillary information.
> + * This does everything except the page copy. The actual page copy
> + * is handled later in a batch manner.
> + */
> + rc = _move_to_new_folio_prep(dst, folio, MIGRATE_NO_COPY);
> +
> + /*
> + * -EAGAIN: Move src/dst folios to tmp lists for retry
> + * Other Errno: Put src folio on ret_folios list, remove the dst folio
> + * Success: Copy the folio bytes, restoring working pte, unlock and
> + * decrement refcounter
> + */
> + if (rc == -EAGAIN) {
> + *retry += 1;
> + *thp_retry += is_thp;
> + *nr_retry_pages += nr_pages;
> +
> + list_move_tail(&folio->lru, &err_src);
> + list_move_tail(&dst->lru, &err_dst);
> + __migrate_folio_record(dst, old_page_state, anon_vma);
> + } else if (rc != MIGRATEPAGE_SUCCESS) {
> + *nr_failed += 1;
> + stats->nr_thp_failed += is_thp;
> + stats->nr_failed_pages += nr_pages;
> +
> + list_del(&dst->lru);
> + migrate_folio_undo_src(folio, old_page_state & PAGE_WAS_MAPPED,
> + anon_vma, true, ret_folios);
> + migrate_folio_undo_dst(dst, true, put_new_folio, private);
> + } else /* MIGRATEPAGE_SUCCESS */
> + nr_mig_folios++;
> +
> + dst = dst2;
> + dst2 = list_next_entry(dst, lru);
> + }
> +
> + /* Exit if folio list for batch migration is empty */
> + if (!nr_mig_folios)
> + goto out;
> +
> + /* Batch copy the folios */
> + {
> + dst = list_first_entry(dst_folios, struct folio, lru);
> + dst2 = list_next_entry(dst, lru);
> + list_for_each_entry_safe(folio, folio2, src_folios, lru) {
> + is_thp = folio_test_large(folio) &&
> + folio_test_pmd_mappable(folio);
> + nr_pages = folio_nr_pages(folio);
> + rc = folio_mc_copy(dst, folio);
> +
> + if (rc) {
> + int old_page_state = 0;
> + struct anon_vma *anon_vma = NULL;
> +
> + /*
> + * dst->private is moved to src->private in
> + * __migrate_folio(), so page state and anon_vma
> + * values can be extracted from (src) folio.
> + */
> + __migrate_folio_extract(folio, &old_page_state,
> + &anon_vma);
> + migrate_folio_undo_src(folio,
> + old_page_state & PAGE_WAS_MAPPED,
> + anon_vma, true, ret_folios);
> + list_del(&dst->lru);
> + migrate_folio_undo_dst(dst, true, put_new_folio,
> + private);
> + }
> +
> + switch (rc) {
> + case MIGRATEPAGE_SUCCESS:
> + stats->nr_succeeded += nr_pages;
> + stats->nr_thp_succeeded += is_thp;
> + break;
> + default:
> + *nr_failed += 1;
> + stats->nr_thp_failed += is_thp;
> + stats->nr_failed_pages += nr_pages;
> + break;
> + }
> +
> + dst = dst2;
> + dst2 = list_next_entry(dst, lru);
> + }
> + }
> +
> + /*
> + * Iterate the folio lists to remove migration pte and restore them
> + * as working pte. Unlock the folios, add/remove them to LRU lists (if
> + * applicable) and release the src folios.
> + */
> + dst = list_first_entry(dst_folios, struct folio, lru);
> + dst2 = list_next_entry(dst, lru);
> + list_for_each_entry_safe(folio, folio2, src_folios, lru) {
> + is_thp = folio_test_large(folio) && folio_test_pmd_mappable(folio);
> + nr_pages = folio_nr_pages(folio);
> + /*
> + * dst->private is moved to src->private in __migrate_folio(),
> + * so page state and anon_vma values can be extracted from
> + * (src) folio.
> + */
> + __migrate_folio_extract(folio, &old_page_state, &anon_vma);
> + list_del(&dst->lru);
> +
> + _move_to_new_folio_finalize(dst, folio, MIGRATEPAGE_SUCCESS);
> +
> + /*
> + * Below few steps are only applicable for lru pages which is
> + * ensured as we have removed the non-lru pages from our list.
> + */
> + _migrate_folio_move_finalize1(folio, dst, old_page_state);
> +
> + _migrate_folio_move_finalize2(folio, dst, reason, anon_vma);
> +
> + /* Page migration successful, increase stat counter */
> + stats->nr_succeeded += nr_pages;
> + stats->nr_thp_succeeded += is_thp;
> +
> + dst = dst2;
> + dst2 = list_next_entry(dst, lru);
> + }
> +out:
> + /* Add tmp folios back to the list to let CPU re-attempt migration. */
> + list_splice(&err_src, src_folios);
> + list_splice(&err_dst, dst_folios);
> +}
> +
> static void migrate_folios_undo(struct list_head *src_folios,
> struct list_head *dst_folios,
> free_folio_t put_new_folio, unsigned long private,
> @@ -1981,13 +2167,18 @@ static int migrate_pages_batch(struct list_head *from,
> /* Flush TLBs for all unmapped folios */
> try_to_unmap_flush();
>
> - retry = 1;
> + retry = 0;
> + /* Batch move the unmapped folios */
> + migrate_folios_batch_move(&unmap_folios, &dst_folios, put_new_folio,
> + private, mode, reason, ret_folios, stats, &retry,
> + &thp_retry, &nr_failed, &nr_retry_pages);
> +
> for (pass = 0; pass < nr_pass && retry; pass++) {
> retry = 0;
> thp_retry = 0;
> nr_retry_pages = 0;
>
> - /* Move the unmapped folios */
> + /* Move the remaining unmapped folios */
> migrate_folios_move(&unmap_folios, &dst_folios,
> put_new_folio, private, mode, reason,
> ret_folios, stats, &retry, &thp_retry,
next prev parent reply other threads:[~2025-01-09 11:48 UTC|newest]
Thread overview: 29+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-01-03 17:24 [RFC PATCH 0/5] Accelerate page migration with batching and multi threads Zi Yan
2025-01-03 17:24 ` [RFC PATCH 1/5] mm: separate move/undo doing on folio list from migrate_pages_batch() Zi Yan
2025-01-03 17:24 ` [RFC PATCH 2/5] mm/migrate: factor out code in move_to_new_folio() and migrate_folio_move() Zi Yan
2025-01-03 17:24 ` [RFC PATCH 3/5] mm/migrate: add migrate_folios_batch_move to batch the folio move operations Zi Yan
2025-01-09 11:47 ` Shivank Garg [this message]
2025-01-09 14:08 ` Zi Yan
2025-01-03 17:24 ` [RFC PATCH 4/5] mm/migrate: introduce multi-threaded page copy routine Zi Yan
2025-01-06 1:18 ` Hyeonggon Yoo
2025-01-06 2:01 ` Zi Yan
2025-02-13 12:44 ` Byungchul Park
2025-02-13 15:34 ` Zi Yan
2025-02-13 21:34 ` Byungchul Park
2025-01-03 17:24 ` [RFC PATCH 5/5] test: add sysctl for folio copy tests and adjust NR_MAX_BATCHED_MIGRATION Zi Yan
2025-01-03 22:21 ` Gregory Price
2025-01-03 22:56 ` Zi Yan
2025-01-03 19:17 ` [RFC PATCH 0/5] Accelerate page migration with batching and multi threads Gregory Price
2025-01-03 19:32 ` Zi Yan
2025-01-03 22:09 ` Yang Shi
2025-01-06 2:33 ` Zi Yan
2025-01-09 11:47 ` Shivank Garg
2025-01-09 15:04 ` Zi Yan
2025-01-09 18:03 ` Shivank Garg
2025-01-09 19:32 ` Zi Yan
2025-01-10 17:05 ` Zi Yan
2025-01-10 19:51 ` Zi Yan
2025-01-16 4:57 ` Shivank Garg
2025-01-21 6:15 ` Shivank Garg
2025-02-13 8:17 ` Byungchul Park
2025-02-13 15:36 ` Zi Yan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=97ed042a-fe70-46cf-80f1-59e7add66860@amd.com \
--to=shivankg@amd.com \
--cc=AneeshKumar.KizhakeVeetil@arm.com \
--cc=Raghavendra.KodsaraThimmappa@amd.com \
--cc=bharata@amd.com \
--cc=david@redhat.com \
--cc=gregory.price@memverge.com \
--cc=jhubbard@nvidia.com \
--cc=jon.grimm@amd.com \
--cc=k.shutemov@gmail.com \
--cc=leesuyeon0506@gmail.com \
--cc=leillc@google.com \
--cc=liam.howlett@oracle.com \
--cc=linux-mm@kvack.org \
--cc=mel.gorman@gmail.com \
--cc=riel@surriel.com \
--cc=rientjes@google.com \
--cc=santosh.shukla@amd.com \
--cc=shy828301@gmail.com \
--cc=sj@kernel.org \
--cc=weixugc@google.com \
--cc=willy@infradead.org \
--cc=ying.huang@linux.alibaba.com \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox