From: Zi Yan <ziy@nvidia.com>
To: Hyeonggon Yoo <hyeonggon.yoo@sk.com>
Cc: linux-mm@kvack.org, kernel_team@skhynix.com, 42.hyeyoo@gmail.com,
David Rientjes <rientjes@google.com>,
Shivank Garg <shivankg@amd.com>,
Aneesh Kumar <AneeshKumar.KizhakeVeetil@arm.com>,
David Hildenbrand <david@redhat.com>,
John Hubbard <jhubbard@nvidia.com>,
Kirill Shutemov <k.shutemov@gmail.com>,
Matthew Wilcox <willy@infradead.org>,
Mel Gorman <mel.gorman@gmail.com>,
"Rao, Bharata Bhasker" <bharata@amd.com>,
Rik van Riel <riel@surriel.com>,
RaghavendraKT <Raghavendra.KodsaraThimmappa@amd.com>,
Wei Xu <weixugc@google.com>, Suyeon Lee <leesuyeon0506@gmail.com>,
Lei Chen <leillc@google.com>,
"Shukla, Santosh" <santosh.shukla@amd.com>,
"Grimm, Jon" <jon.grimm@amd.com>,
sj@kernel.org, shy828301@gmail.com,
Liam Howlett <liam.howlett@oracle.com>,
Gregory Price <gregory.price@memverge.com>,
"Huang, Ying" <ying.huang@linux.alibaba.com>
Subject: Re: [RFC PATCH 4/5] mm/migrate: introduce multi-threaded page copy routine
Date: Sun, 05 Jan 2025 21:01:48 -0500 [thread overview]
Message-ID: <8B66C7BA-96D6-4E04-89F7-13829BF480D7@nvidia.com> (raw)
In-Reply-To: <f8fee669-76bc-48f1-85cb-962ede28d7cd@sk.com>
On 5 Jan 2025, at 20:18, Hyeonggon Yoo wrote:
> On 2025-01-04 2:24 AM, Zi Yan wrote:
>> Now page copies are batched, multi-threaded page copy can be used to
>> increase page copy throughput. Add copy_page_lists_mt() to copy pages in
>> multi-threaded manners. Empirical data show more than 32 base pages are
>> needed to show the benefit of using multi-threaded page copy, so use 32 as
>> the threshold.
>>
>> Signed-off-by: Zi Yan <ziy@nvidia.com>
>> ---
>> include/linux/migrate.h | 3 +
>> mm/Makefile | 2 +-
>> mm/copy_pages.c | 186 ++++++++++++++++++++++++++++++++++++++++
>> mm/migrate.c | 19 ++--
>> 4 files changed, 199 insertions(+), 11 deletions(-)
>> create mode 100644 mm/copy_pages.c
>>
>
> [...snip...]
>
>> +++ b/mm/copy_pages.c
>> @@ -0,0 +1,186 @@
>> +// SPDX-License-Identifier: GPL-2.0
>> +/*
>> + * Parallel page copy routine.
>> + */
>> +
>> +#include <linux/sysctl.h>
>> +#include <linux/highmem.h>
>> +#include <linux/workqueue.h>
>> +#include <linux/slab.h>
>> +#include <linux/migrate.h>
>> +
>> +
>> +unsigned int limit_mt_num = 4;
>> +
>> +struct copy_item {
>> + char *to;
>> + char *from;
>> + unsigned long chunk_size;
>> +};
>> +
>> +struct copy_page_info {
>> + struct work_struct copy_page_work;
>> + unsigned long num_items;
>> + struct copy_item item_list[];
>> +};
>> +
>> +static void copy_page_routine(char *vto, char *vfrom,
>> + unsigned long chunk_size)
>> +{
>> + memcpy(vto, vfrom, chunk_size);
>> +}
>> +
>> +static void copy_page_work_queue_thread(struct work_struct *work)
>> +{
>> + struct copy_page_info *my_work = (struct copy_page_info *)work;
>> + int i;
>> +
>> + for (i = 0; i < my_work->num_items; ++i)
>> + copy_page_routine(my_work->item_list[i].to,
>> + my_work->item_list[i].from,
>> + my_work->item_list[i].chunk_size);
>> +}
>> +
>> +int copy_page_lists_mt(struct list_head *dst_folios,
>> + struct list_head *src_folios, int nr_items)
>> +{
>> + int err = 0;
>> + unsigned int total_mt_num = limit_mt_num;
>> + int to_node = folio_nid(list_first_entry(dst_folios, struct folio, lru));
>> + int i;
>> + struct copy_page_info *work_items[32] = {0};
>> + const struct cpumask *per_node_cpumask = cpumask_of_node(to_node);
>
> What happens here if to_node is a NUMA node without CPUs? (e.g. CXL
> node).
I did not think about that case. In that case, from_node will be used.
If both from and to are CPUless nodes, maybe the node of the executing
CPU should be used to select cpumask here.
>
> And even with a NUMA node with CPUs I think offloading copies to CPUs
> of either "from node" or "to node" will end up a CPU touching two pages
> in two different NUMA nodes anyway, one page in the local node
> and the other page in the remote node.
>
> In that sense, I don't understand when push_0_pull_1 (introduced in
> patch 5) should be 0 or 1. Am I missing something?
From my experiments, copy throughput differs between pushing data from local
CPUs and pulling data from remote CPUs. On NVIDIA Grace CPU, pushing data
has higher throughput. Back in 2019, when I tested it on Intel Xeon Broadwell,
pulling data has higher throughput. In the final version, a boot time
benchmark might be needed to decide whether to push data or pull data.
>> + int cpu_id_list[32] = {0};
>> + int cpu;
>> + int max_items_per_thread;
>> + int item_idx;
>> + struct folio *src, *src2, *dst, *dst2;
>> +
>> + total_mt_num = min_t(unsigned int, total_mt_num,
>> + cpumask_weight(per_node_cpumask));
>> +
>> + if (total_mt_num > 32)
>> + total_mt_num = 32;
>> +
>> + /* Each threads get part of each page, if nr_items < totla_mt_num */
>> + if (nr_items < total_mt_num)
>> + max_items_per_thread = nr_items;
>> + else
>> + max_items_per_thread = (nr_items / total_mt_num) +
>> + ((nr_items % total_mt_num) ? 1 : 0);
>> +
>> +
>> + for (cpu = 0; cpu < total_mt_num; ++cpu) {
>> + work_items[cpu] = kzalloc(sizeof(struct copy_page_info) +
>> + sizeof(struct copy_item) * max_items_per_thread,
>> + GFP_NOWAIT);
>> +
>> + if (!work_items[cpu]) {
>> + err = -ENOMEM;
>> + goto free_work_items;
>> + }
>> + }
>
> [...snip...]
>
>> +
>> + /* Wait until it finishes */
>> + for (i = 0; i < total_mt_num; ++i)
>> + flush_work((struct work_struct *)work_items[i]);
>> +
>> +free_work_items:
>> + for (cpu = 0; cpu < total_mt_num; ++cpu)
>> + kfree(work_items[cpu]);
>> +
>> + return err;
>
> Should the kernel re-try migration without multi-threading if it failed
> to allocate memory?
Sure. Will add it in the next version.
Thank you for the reviews.
--
Best Regards,
Yan, Zi
next prev parent reply other threads:[~2025-01-06 2:02 UTC|newest]
Thread overview: 29+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-01-03 17:24 [RFC PATCH 0/5] Accelerate page migration with batching and multi threads Zi Yan
2025-01-03 17:24 ` [RFC PATCH 1/5] mm: separate move/undo doing on folio list from migrate_pages_batch() Zi Yan
2025-01-03 17:24 ` [RFC PATCH 2/5] mm/migrate: factor out code in move_to_new_folio() and migrate_folio_move() Zi Yan
2025-01-03 17:24 ` [RFC PATCH 3/5] mm/migrate: add migrate_folios_batch_move to batch the folio move operations Zi Yan
2025-01-09 11:47 ` Shivank Garg
2025-01-09 14:08 ` Zi Yan
2025-01-03 17:24 ` [RFC PATCH 4/5] mm/migrate: introduce multi-threaded page copy routine Zi Yan
2025-01-06 1:18 ` Hyeonggon Yoo
2025-01-06 2:01 ` Zi Yan [this message]
2025-02-13 12:44 ` Byungchul Park
2025-02-13 15:34 ` Zi Yan
2025-02-13 21:34 ` Byungchul Park
2025-01-03 17:24 ` [RFC PATCH 5/5] test: add sysctl for folio copy tests and adjust NR_MAX_BATCHED_MIGRATION Zi Yan
2025-01-03 22:21 ` Gregory Price
2025-01-03 22:56 ` Zi Yan
2025-01-03 19:17 ` [RFC PATCH 0/5] Accelerate page migration with batching and multi threads Gregory Price
2025-01-03 19:32 ` Zi Yan
2025-01-03 22:09 ` Yang Shi
2025-01-06 2:33 ` Zi Yan
2025-01-09 11:47 ` Shivank Garg
2025-01-09 15:04 ` Zi Yan
2025-01-09 18:03 ` Shivank Garg
2025-01-09 19:32 ` Zi Yan
2025-01-10 17:05 ` Zi Yan
2025-01-10 19:51 ` Zi Yan
2025-01-16 4:57 ` Shivank Garg
2025-01-21 6:15 ` Shivank Garg
2025-02-13 8:17 ` Byungchul Park
2025-02-13 15:36 ` Zi Yan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=8B66C7BA-96D6-4E04-89F7-13829BF480D7@nvidia.com \
--to=ziy@nvidia.com \
--cc=42.hyeyoo@gmail.com \
--cc=AneeshKumar.KizhakeVeetil@arm.com \
--cc=Raghavendra.KodsaraThimmappa@amd.com \
--cc=bharata@amd.com \
--cc=david@redhat.com \
--cc=gregory.price@memverge.com \
--cc=hyeonggon.yoo@sk.com \
--cc=jhubbard@nvidia.com \
--cc=jon.grimm@amd.com \
--cc=k.shutemov@gmail.com \
--cc=kernel_team@skhynix.com \
--cc=leesuyeon0506@gmail.com \
--cc=leillc@google.com \
--cc=liam.howlett@oracle.com \
--cc=linux-mm@kvack.org \
--cc=mel.gorman@gmail.com \
--cc=riel@surriel.com \
--cc=rientjes@google.com \
--cc=santosh.shukla@amd.com \
--cc=shivankg@amd.com \
--cc=shy828301@gmail.com \
--cc=sj@kernel.org \
--cc=weixugc@google.com \
--cc=willy@infradead.org \
--cc=ying.huang@linux.alibaba.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox