From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D5E90E77197 for ; Mon, 6 Jan 2025 01:18:20 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 25EBE6B0083; Sun, 5 Jan 2025 20:18:20 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 20D716B0088; Sun, 5 Jan 2025 20:18:20 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0AE9C6B0089; Sun, 5 Jan 2025 20:18:20 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id E17A76B0083 for ; Sun, 5 Jan 2025 20:18:19 -0500 (EST) Received: from smtpin24.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 6B123C12C0 for ; Mon, 6 Jan 2025 01:18:19 +0000 (UTC) X-FDA: 82975266318.24.A7EECD9 Received: from invmail3.skhynix.com (exvmail3.hynix.com [166.125.252.90]) by imf02.hostedemail.com (Postfix) with ESMTP id B618680010 for ; Mon, 6 Jan 2025 01:18:16 +0000 (UTC) Authentication-Results: imf02.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf02.hostedemail.com: domain of hyeonggon.yoo@sk.com designates 166.125.252.90 as permitted sender) smtp.mailfrom=hyeonggon.yoo@sk.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1736126297; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=rm6erK9Nax+P1LFIzw+rc6UxSnyhOfxK/OUaNm57IG0=; b=hD+TKrLh8I4wqGi5jBKcNAZUhYYh5pyqrnDjl1wOAO+AQL9IQtTqM58qs5OmGq36IenUk5 NujYmbXgXLl/N4PqghVdSRb9j/MfulOD+SITgqvWdx7A85IHK+O+LEwQSe26VcFUYC1qeL C7Iz6sBeCNJJOVBjkqqmIBBmUgv0vAw= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1736126297; a=rsa-sha256; cv=none; b=G4pT38d9+1bWsITLSgxLSFUjyy/UiSXDTnVnNTTU4q7UF5BMdO5F7uJdDkXWVMCbSSdKfo UwHEs5nxKn2rBOyaFeqrxEajrkffdAtC7AfRzi0fU+rb5pzdE9ejDCp/NohL7wSVHWYGkw 51vbD0jTdnrDhkrjogsA0ZZzV0oZAnc= ARC-Authentication-Results: i=1; imf02.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf02.hostedemail.com: domain of hyeonggon.yoo@sk.com designates 166.125.252.90 as permitted sender) smtp.mailfrom=hyeonggon.yoo@sk.com X-AuditID: a67dfc59-7a9ff700000194b3-64-677b2f52cc2f Message-ID: Date: Mon, 6 Jan 2025 10:18:09 +0900 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Cc: kernel_team@skhynix.com, 42.hyeyoo@gmail.com, David Rientjes , Shivank Garg , Aneesh Kumar , David Hildenbrand , John Hubbard , Kirill Shutemov , Matthew Wilcox , Mel Gorman , "Rao, Bharata Bhasker" , Rik van Riel , RaghavendraKT , Wei Xu , Suyeon Lee , Lei Chen , "Shukla, Santosh" , "Grimm, Jon" , "sj@kernel.org" , "shy828301@gmail.com" , Liam Howlett , Gregory Price , "Huang, Ying" Subject: Re: [RFC PATCH 4/5] mm/migrate: introduce multi-threaded page copy routine To: Zi Yan , "linux-mm@kvack.org" References: <20250103172419.4148674-1-ziy@nvidia.com> <20250103172419.4148674-5-ziy@nvidia.com> Content-Language: en-US From: Hyeonggon Yoo In-Reply-To: <20250103172419.4148674-5-ziy@nvidia.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFnrKIsWRmVeSWpSXmKPExsXC9ZZnoW6wfnW6wf3bTBYTewwsnp09wGyx dOI7Zouv638xWzQ0PWKx2Ph0EbvFr6d/GS1mTvvCbrHnmobFp503mS22Nzxgt7i35j+rRWP/ bzaLqV0vmS2ubdrKatG2ZCOTxZvdeRbn1nxmt1hwYjGjxeGvb5gs3l/7yG7x+8ccNovVazIs Zh+9x+4g4dF66S+bx5p5axg9ds66y+6xYFOpx+YVWh6bVnWyeWz6NIndY+dDS4+NH/+ze/Q2 v2Pz+Pj0FovH+31X2Tz2rLrKGMAbxWWTkpqTWZZapG+XwJXx9/MJ5oKjchUXHlg0MD4X72Lk 5JAQMJF417GArYuRA8ye2MoIEuYVsJSY9egDK4jNIqAi8ffHIxaIuKDEyZlPwGxRAXmJ+7dm sHcxcnEwCzSwS1xdcYcJZI6wQKjEjNNMIDUiAm4SDTea2UHCQgKJEh+uG4OEmQXEJW49mQ9W zSagJbGjMxUkzClgJnFzynZGiBIzia6tXVC2vMT2t3OYQTZJCLxll9h/rZsd4npJiYMrbrBM YBScheS6WUhWzEIyaxaSWQsYWVYximTmleUmZuYY6xVnZ1TmZVboJefnbmIERv6y2j+ROxi/ XQg+xCjAwajEw1swrypdiDWxrLgy9xCjBAezkghvlkZluhBvSmJlVWpRfnxRaU5q8SFGaQ4W JXFeo2/lKUIC6YklqdmpqQWpRTBZJg5OqQbGtVHNGyeGFoau/xb6eekU6zzR2+9eLoz6I69g NYn1v4aPavp8jrmb+VsyDssuK7laIfZVzXXXwv89Cxw+lc+Zt0nf+Ovv2w9Wnmm/IZG3VJ/b wOsaR0dOiID61KQLU88/vDXl5s6GG38OvChld5B1j+di/8z5bE5VRVThp0cvHOUuL5uwuObg aiWW4oxEQy3mouJEAKaTN1L4AgAA X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFjrFIsWRmVeSWpSXmKPExsXCNUOnRDdIvzrdoHupjsXEHgOLZ2cPMFss nfiO2eLr+l/MFg1Nj1gsNj5dxG7x6+lfRouZ076wWxyee5LVYs81DYtPO28yW2xveMBucW/N f1aLxv7fbBZTu14yW1zbtJXVom3JRiaLN7vzLM6t+cxuseDEYkaLw1/fMFm8v/aR3eL3jzls FqvXZFjMPnqP3UHSo/XSXzaPNfPWMHrsnHWX3WPBplKPzSu0PDat6mTz2PRpErvHzoeWHhs/ /mf36G1+x+bx8ektFo/3+66yeSx+8YHJY8+qq4wBfFFcNimpOZllqUX6dglcGX8/n2AuOCpX ceGBRQPjc/EuRg4OCQETiYmtjF2MnBy8ApYSsx59YAWxWQRUJP7+eMQCEReUODnzCZgtKiAv cf/WDPYuRi4OZoEGdomrK+4wgcwRFgiVmHGaCaRGRMBNouFGMztIWEggUeLDdWOQMLOAuMSt J/PBqtkEtCR2dKaChDkFzCRuTtnOCFFiJtG1tQvKlpfY/nYO8wRGvllIjpiFZNIsJC2zkLQs YGRZxSiSmVeWm5iZY6ZXnJ1RmZdZoZecn7uJERjDy2r/TNrB+O2y+yFGAQ5GJR7egnlV6UKs iWXFlbmHGCU4mJVEeLM0KtOFeFMSK6tSi/Lji0pzUosPMUpzsCiJ83qFpyYICaQnlqRmp6YW pBbBZJk4OKUaGCV3ubMlm/w6tId/+7c+lsXm524tmZiZ6Jt/bu/cEyoHZhyN2Tpnw2bVtyKr 3I9syexYNHHKBB/Nicc2Md0/sm31nvtONa5qE2ec85tVWBR4ZvHkB+uXnb9vI9YTvC5yZkjQ /g+vdmnN3NPXG8+WPavU5lxYR1hP9l/h7ln3HvRoML32FPjmaqGgxFKckWioxVxUnAgA/1PL Wt0CAAA= X-CFilter-Loop: Reflected X-Stat-Signature: fhms1dya7bpg7bwkcozg15d3gk9awtd8 X-Rspamd-Queue-Id: B618680010 X-Rspam-User: X-Rspamd-Server: rspam01 X-HE-Tag: 1736126296-232566 X-HE-Meta: U2FsdGVkX19E5AX3QcprG2SFhRSP2Gw2n++ZTDk8E4W/mLCfKM2TjoJuwRmUsH5Cck05FcO9itSpfBvZV7PnYbKSL6ouDPV95lKqNcs33nB5NUGuSH4Pn70x6hB4iprKH3Hu4sK7lUr+fGBq8PbNrHsBmsihazM69FUp279YKjknmOgqZzCOimpMWkozur+V7DCJlP/9dC1Prm5wJhw5YkrnWmR16S+JsdnKWeekgA2Kf/8SgYVLgOHa/xhmyHH9smQ2P5QpaWh03u12k8qWcoNcetifOqUq/c1t6GiTj0wHq/jZNPlrDGMjtd7PY3RhPqC/T/Pxdik4DxiFshIAWfZsB2cLoo0GncWnI667zdbDEFz3sJEZW4Z2R6qhydDeHTy8zHYp7ptvbO33ntEfTV5zL/k1wfwv4m7Kf7t0bF9vPu/Ws6V96wAdMScKgzVUMwfMuVmPJXF/V+DLMn90caqhmy9Sa3qZ7BTQePiWCPg7MXeqgSFN3etFvqNJJYhWiHUzYBWBLrWE6155LXAqXhZh3riEVpmvxt/kDAnICFd4Vlekmuwq7ZJKVWTkYOnUYavBZIvxN2kDUFCy3dF891KeoUpSUN00tpZLfr0Ecnm0XQotu9DsWlgQGU0WajK/Q16vJdr3Rmba9KCBW01GeujZNbVUTW+1ZN44v8OmKWkkY3qke6hx4uy7ann20x94tD78NdbPDqsDOOsw4OhYGgIIUXfk+rw4Yw8mJqSZGdOcG2AhMZXqyLSGHAKVe+S5/NfVLdt38IrVJqZAEIXwlCQMcNNgcQiTGQGAqROcXXfLecD0/TtEC74IhzePJcOeqaKj492PMwai8hcwBFOBRFot0WAaOdc96GiX51+xVvDnpLamZC8jni1/PEEclHJD49Y2NUNCFMy5JZtOwRNrKLBQFYVHxuQGd0rjnMGLm2Prrxd2PuHI/bS2B8/1OE8fwSDjWyDNNOhAjcM4xGK hnjaunQk 2l4QE9Zs+8m9wEOOOEpjTRYrriPyV+yUp6jAiSs9gg1vAK4aKfPI+kJae/SyvUHaBXRVdYmMg5KKpRs1oNZ6paLtTMIYojFcqbn57YtXzlbC4r0qlMCMVLoihmYReXZkII7keV2+3vRtI3m2DT3fAw1dh+pnshTbTJlr/jP6BsqmSS4yYWsD1vpaawAQOX5vC2BO7 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 2025-01-04 2:24 AM, Zi Yan wrote: > Now page copies are batched, multi-threaded page copy can be used to > increase page copy throughput. Add copy_page_lists_mt() to copy pages in > multi-threaded manners. Empirical data show more than 32 base pages are > needed to show the benefit of using multi-threaded page copy, so use 32 as > the threshold. > > Signed-off-by: Zi Yan > --- > include/linux/migrate.h | 3 + > mm/Makefile | 2 +- > mm/copy_pages.c | 186 ++++++++++++++++++++++++++++++++++++++++ > mm/migrate.c | 19 ++-- > 4 files changed, 199 insertions(+), 11 deletions(-) > create mode 100644 mm/copy_pages.c > [...snip...] > +++ b/mm/copy_pages.c > @@ -0,0 +1,186 @@ > +// SPDX-License-Identifier: GPL-2.0 > +/* > + * Parallel page copy routine. > + */ > + > +#include > +#include > +#include > +#include > +#include > + > + > +unsigned int limit_mt_num = 4; > + > +struct copy_item { > + char *to; > + char *from; > + unsigned long chunk_size; > +}; > + > +struct copy_page_info { > + struct work_struct copy_page_work; > + unsigned long num_items; > + struct copy_item item_list[]; > +}; > + > +static void copy_page_routine(char *vto, char *vfrom, > + unsigned long chunk_size) > +{ > + memcpy(vto, vfrom, chunk_size); > +} > + > +static void copy_page_work_queue_thread(struct work_struct *work) > +{ > + struct copy_page_info *my_work = (struct copy_page_info *)work; > + int i; > + > + for (i = 0; i < my_work->num_items; ++i) > + copy_page_routine(my_work->item_list[i].to, > + my_work->item_list[i].from, > + my_work->item_list[i].chunk_size); > +} > + > +int copy_page_lists_mt(struct list_head *dst_folios, > + struct list_head *src_folios, int nr_items) > +{ > + int err = 0; > + unsigned int total_mt_num = limit_mt_num; > + int to_node = folio_nid(list_first_entry(dst_folios, struct folio, lru)); > + int i; > + struct copy_page_info *work_items[32] = {0}; > + const struct cpumask *per_node_cpumask = cpumask_of_node(to_node); What happens here if to_node is a NUMA node without CPUs? (e.g. CXL node). And even with a NUMA node with CPUs I think offloading copies to CPUs of either "from node" or "to node" will end up a CPU touching two pages in two different NUMA nodes anyway, one page in the local node and the other page in the remote node. In that sense, I don't understand when push_0_pull_1 (introduced in patch 5) should be 0 or 1. Am I missing something? > + int cpu_id_list[32] = {0}; > + int cpu; > + int max_items_per_thread; > + int item_idx; > + struct folio *src, *src2, *dst, *dst2; > + > + total_mt_num = min_t(unsigned int, total_mt_num, > + cpumask_weight(per_node_cpumask)); > + > + if (total_mt_num > 32) > + total_mt_num = 32; > + > + /* Each threads get part of each page, if nr_items < totla_mt_num */ > + if (nr_items < total_mt_num) > + max_items_per_thread = nr_items; > + else > + max_items_per_thread = (nr_items / total_mt_num) + > + ((nr_items % total_mt_num) ? 1 : 0); > + > + > + for (cpu = 0; cpu < total_mt_num; ++cpu) { > + work_items[cpu] = kzalloc(sizeof(struct copy_page_info) + > + sizeof(struct copy_item) * max_items_per_thread, > + GFP_NOWAIT); > + > + if (!work_items[cpu]) { > + err = -ENOMEM; > + goto free_work_items; > + } > + } [...snip...] > + > + /* Wait until it finishes */ > + for (i = 0; i < total_mt_num; ++i) > + flush_work((struct work_struct *)work_items[i]); > + > +free_work_items: > + for (cpu = 0; cpu < total_mt_num; ++cpu) > + kfree(work_items[cpu]); > + > + return err; Should the kernel re-try migration without multi-threading if it failed to allocate memory? --- Hyeonggon