From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id EC8E4C0219D for ; Thu, 13 Feb 2025 08:17:17 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 61685280005; Thu, 13 Feb 2025 03:17:17 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 5C5C9280001; Thu, 13 Feb 2025 03:17:17 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4B66D280005; Thu, 13 Feb 2025 03:17:17 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 2F6DB280001 for ; Thu, 13 Feb 2025 03:17:17 -0500 (EST) Received: from smtpin08.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id D48CC1608DF for ; Thu, 13 Feb 2025 08:17:16 +0000 (UTC) X-FDA: 83114216472.08.20E4859 Received: from invmail4.hynix.com (exvmail4.hynix.com [166.125.252.92]) by imf06.hostedemail.com (Postfix) with ESMTP id 44A47180002 for ; Thu, 13 Feb 2025 08:17:13 +0000 (UTC) Authentication-Results: imf06.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf06.hostedemail.com: domain of byungchul@sk.com designates 166.125.252.92 as permitted sender) smtp.mailfrom=byungchul@sk.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1739434635; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=MywzNRTAH2jgcn0G+KKakS4K/CpIZrwdLeWEvapGWrI=; b=QKwzSa1/HqOZt11abUEdjVSjKg0kHv7IUR4tynS1EHnmOBMb7v64O3VjWbKRAfEdvj+xcd bF86NnuMrxCcg3V4eN5LDWcFpQJnODqk4HDFbSy8AQcLGSfo+8M6n+I9pXwaSc9PoiGLZb W0AtAwDq4shamSSfzVaWQgkebPaWb+k= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1739434635; a=rsa-sha256; cv=none; b=j5alf41/cJLYtTEUvPgw1SFK6vICaT4R6ZTxpBN/ojwnsPqe/RRNbZyghVmTpgb905GrTg wblutMxCwWA2Mn8dlV41J+ho8E/S3p4W+DA4oI0xXmDiVQh01hGEwzaMmvmbr00GToVga7 6QY/ILpLXb4AZxEwxZRgB9z0FPDn+8g= ARC-Authentication-Results: i=1; imf06.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf06.hostedemail.com: domain of byungchul@sk.com designates 166.125.252.92 as permitted sender) smtp.mailfrom=byungchul@sk.com X-AuditID: a67dfc5b-3c9ff7000001d7ae-0f-67adaa87cb78 Date: Thu, 13 Feb 2025 17:17:06 +0900 From: Byungchul Park To: Zi Yan Cc: linux-mm@kvack.org, David Rientjes , Shivank Garg , Aneesh Kumar , David Hildenbrand , John Hubbard , Kirill Shutemov , Matthew Wilcox , Mel Gorman , "Rao, Bharata Bhasker" , Rik van Riel , RaghavendraKT , Wei Xu , Suyeon Lee , Lei Chen , "Shukla, Santosh" , "Grimm, Jon" , sj@kernel.org, shy828301@gmail.com, Liam Howlett , Gregory Price , "Huang, Ying" , kernel_team@skhynix.com Subject: Re: [RFC PATCH 0/5] Accelerate page migration with batching and multi threads Message-ID: <20250213081706.GA36855@system.software.com> References: <20250103172419.4148674-1-ziy@nvidia.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20250103172419.4148674-1-ziy@nvidia.com> User-Agent: Mutt/1.9.4 (2018-02-28) X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFjrJIsWRmVeSWpSXmKPExsXC9ZZnkW77qrXpBn0njCyenT3AbLF04jtm i6/rfzFbNDQ9YrHY+HQRu8Wvp38ZLWZO+8JuseeahsWnnTeZLbY3PGC3uLfmP6tFY/9vNoup XS+ZLa5t2spq0bZkI5PFm915FufWfGa3WHBiMaPF4a9vmCzeX/vIbvH7xxw2i9VrMixmH73H 7iDu0XrpL5vHmnlrGD12zrrL7rFgU6nH5hVaHptWdbJ5bPo0id1j50NLj40f/7N79Da/Y/P4 +PQWi8f7fVfZPPasusoYwBvFZZOSmpNZllqkb5fAlbFo61rmgpk6FW1HtjI1MB5X6mLk5JAQ MJGYMbONDcY+f2UZM4jNIqAqMe/fA3YQm01AXeLGjZ9gcREBaYnTfX+AbC4OZoG7bBITv11i AUkIC0RITH3xFszmFbCQ2HpkI1izkICpxN0596DighInZz4Bs5kFtCRu/HvJ1MXIAWRLSyz/ xwES5hQwkzjSdQasVVRAWeLAtuNMELddY5e4vS0KwpaUOLjiBssERoFZSKbOQjJ1FsLUBYzM qxiFMvPKchMzc0z0MirzMiv0kvNzNzECI3hZ7Z/oHYyfLgQfYhTgYFTi4Z1xa026EGtiWXFl 7iFGCQ5mJRFeiWlAId6UxMqq1KL8+KLSnNTiQ4zSHCxK4rxG38pThATSE0tSs1NTC1KLYLJM HJxSDYxcr/7PCU9r8zu18VT0zFWtLNe4Jkj9DYy4a626MK9+pXKN7YEp2xsWHjzDt3fSoyfq YjeUkuM+iU1V+L4u49Bcxrscuvb+KxJ/14p7eJW58hR8Lp+63uWkWVHIpGAuJ/eT1i4sFrEa 1xf+l4ps0e149PzG3cUxM3LT9nK0PRXmFxLlvM+z95sSS3FGoqEWc1FxIgBjPzX13AIAAA== X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFtrIIsWRmVeSWpSXmKPExsXC5WfdrNu+am26waS30hbPzh5gtlg68R2z xdf1v5gtGpoesVhsfLqI3eLX07+MFjOnfWG3ODz3JKvFnmsaFp923mS22N7wgN3i3pr/rBaN /b/ZLKZ2vWS2uLZpK6tF25KNTBZvdudZnFvzmd1iwYnFjBaHv75hsnh/7SO7xe8fc9gsVq/J sJh99B67g4RH66W/bB5r5q1h9Ng56y67x4JNpR6bV2h5bFrVyeax6dMkdo+dDy09Nn78z+7R 2/yOzePj01ssHu/3XWXzWPziA5PHnlVXGQP4orhsUlJzMstSi/TtErgyFm1dy1wwU6ei7chW pgbG40pdjJwcEgImEuevLGMGsVkEVCXm/XvADmKzCahL3LjxEywuIiAtcbrvD5DNxcEscJdN YuK3SywgCWGBCImpL96C2bwCFhJbj2wEaxYSMJW4O+ceVFxQ4uTMJ2A2s4CWxI1/L5m6GDmA bGmJ5f84QMKcAmYSR7rOgLWKCihLHNh2nGkCI+8sJN2zkHTPQuhewMi8ilEkM68sNzEzx1Sv ODujMi+zQi85P3cTIzAil9X+mbiD8ctl90OMAhyMSjy8M26tSRdiTSwrrsw9xCjBwawkwisx DSjEm5JYWZValB9fVJqTWnyIUZqDRUmc1ys8NUFIID2xJDU7NbUgtQgmy8TBKdXAKLglxsO3 8/sGtXLrnw5NL6N6favS+aafOnbpq9vi+eFms7T37r7O5VQh8jj+8y/NlQ/kVpXxMRv0r5aV OfBywtJreYvn1u57FOte1vrf8aX3ft/6e8uPa+cufRNs5PJQyP/0r0k2BXof/i5zMg7Q9573 svHywWCNTXkrN32/uvvU7z+ukxNvBCqxFGckGmoxFxUnAgD/ryWUxAIAAA== X-CFilter-Loop: Reflected X-Rspam-User: X-Rspamd-Queue-Id: 44A47180002 X-Rspamd-Server: rspam07 X-Stat-Signature: 98yamr5xa9hgd56w9zs35w6x4gykb767 X-HE-Tag: 1739434633-428353 X-HE-Meta: U2FsdGVkX1+eDFWQyy/IU5/0H61xflzwGI76/U+cJNVixxlg5TNohwhndhoMKqzPpk+jXtSs4rZYYACyxjwoPDsuiuwYcUK//x1rKoRzdplAETcnAVuPRxV1yl53Xp8a+9E/alp5RUJETHiT/GSxeW9UxwIKDgzldvLFMdhJv5JwWcSgiRK3IsxnNTdAOGtUtlYDCv9J8qxplgyyRq9wOCeHEo7uQZIG+OLYyE8HYSlTuws3+MbN60ky3fpRLMXcxjyc0wPmnhyppHzowUU/6s4bD0HcMwxgCY+oLJ7IsbXnT5abH1+A3NXRC8ecik6Q2rp1X1fkXpkC7/HJL6BxpQxxibdXPgAQ3KSkPbcqXvcn3E1IJgZ3HMTeEBO8qn3lmn8uvQ4YOqyUw9U5yTAmPRRDJSXdIe63hsxd8gZGNzBHpT/tOJnV03qaJYkNKwXwPEUd3qMtzIp0ZnBIucWjBD1RyKqGggYftcYzovjhBYTmVX1Atnv3VVPBFyKrmjUu77pW3patzSlNUjzxUnKwGwZTBqaPqc3/5Ozy/2jjapWDeQewLrYyCfXxnAK/1I6BEv2I/efnMGoyHpTd7jP9g+s/fIh5K/3jL3wYKwP2dezKlyscWUJ95ZEgVMb6dS5JXKFfafjVFvnRl5snWd3ZMYBkJ76XHoMBYa75CfsM9rfcTZXouQWaX5JOrQEkpUpia+EkhK/cmaujV7HYTN1pJnF/OGkHQqXnYlrrBoAMSPtv+tH97EbT8XOJx+Jrlpt31BCyHtveH1VVNUfqv/BHF5EGlu/YGHG7NXRX9wmaLIOhe4durz1r8V55fNYyUMmev9B6zCN7MsAPBNlUVGzCIc37s8TzwESOuJtlAjNDWlN7V8NsECgvRZugonoKqP1fhRs//FaSiE31G1q3UG84v5Krv2ydYcWcA9Q5lH1N9FID++9ijvjww2aehwkdUqvLCn3S0jT42zpLePhTfeq kUhwJ4JJ R9dwdReVKZGF2L9qM+8HAd0gHMFUlHiFm4RSQhQYiAiyudf7SHWLQYdHCXaX9PvDANaSBmlQq3IG2azY5W2NY6zvG4NEKEPTxAfuZkbve3neVArGX4KuO8ZxdRg+HLMjWv1Hqon6yyPRqzI0JxxitM9lzRTwVamWuNlT3PUjVKJt8z5dtopiRCvNbLA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Fri, Jan 03, 2025 at 12:24:14PM -0500, Zi Yan wrote: > Hi all, Hi, It'd be appreciated to cc me from the next. Byungchul > > This patchset accelerates page migration by batching folio copy operations and > using multiple CPU threads and is based on Shivank's Enhancements to Page > Migration with Batch Offloading via DMA patchset[1] and my original accelerate > page migration patchset[2]. It is on top of mm-everything-2025-01-03-05-59. > The last patch is for testing purpose and should not be considered. > > The motivations are: > > 1. Batching folio copy increases copy throughput. Especially for base page > migrations, folio copy throughput is low since there are kernel activities like > moving folio metadata and updating page table entries sit between two folio > copies. And base page sizes are relatively small, 4KB on x86_64, ARM64 > and 64KB on ARM64. > > 2. Single CPU thread has limited copy throughput. Using multi threads is > a natural extension to speed up folio copy, when DMA engine is NOT > available in a system. > > > Design > === > > It is based on Shivank's patchset and revise MIGRATE_SYNC_NO_COPY > (renamed to MIGRATE_NO_COPY) to avoid folio copy operation inside > migrate_folio_move() and perform them in one shot afterwards. A > copy_page_lists_mt() function is added to use multi threads to copy > folios from src list to dst list. > > Changes compared to Shivank's patchset (mainly rewrote batching folio > copy code) > === > > 1. mig_info is removed, so no memory allocation is needed during > batching folio copies. src->private is used to store old page state and > anon_vma after folio metadata is copied from src to dst. > > 2. move_to_new_folio() and migrate_folio_move() are refactored to remove > redundant code in migrate_folios_batch_move(). > > 3. folio_mc_copy() is used for the single threaded copy code to keep the > original kernel behavior. > > > Performance > === > > I benchmarked move_pages() throughput on a two socket NUMA system with two > NVIDIA Grace CPUs. The base page size is 64KB. Both 64KB page migration and 2MB > mTHP page migration are measured. > > The tables below show move_pages() throughput with different > configurations and different numbers of copied pages. The x-axis is the > configurations, from vanilla Linux kernel to using 1, 2, 4, 8, 16, 32 > threads with this patchset applied. And the unit is GB/s. > > The 32-thread copy throughput can be up to 10x of single thread serial folio > copy. Batching folio copy not only benefits huge page but also base > page. > > 64KB (GB/s): > > vanilla mt_1 mt_2 mt_4 mt_8 mt_16 mt_32 > 32 5.43 4.90 5.65 7.31 7.60 8.61 6.43 > 256 6.95 6.89 9.28 14.67 22.41 23.39 23.93 > 512 7.88 7.26 10.15 17.53 27.82 27.88 33.93 > 768 7.65 7.42 10.46 18.59 28.65 29.67 30.76 > 1024 7.46 8.01 10.90 17.77 27.04 32.18 38.80 > > 2MB mTHP (GB/s): > > vanilla mt_1 mt_2 mt_4 mt_8 mt_16 mt_32 > 1 5.94 2.90 6.90 8.56 11.16 8.76 6.41 > 2 7.67 5.57 7.11 12.48 17.37 15.68 14.10 > 4 8.01 6.04 10.25 20.14 22.52 27.79 25.28 > 8 8.42 7.00 11.41 24.73 33.96 32.62 39.55 > 16 9.41 6.91 12.23 27.51 43.95 49.15 51.38 > 32 10.23 7.15 13.03 29.52 49.49 69.98 71.51 > 64 9.40 7.37 13.88 30.38 52.00 76.89 79.41 > 128 8.59 7.23 14.20 28.39 49.98 78.27 90.18 > 256 8.43 7.16 14.59 28.14 48.78 76.88 92.28 > 512 8.31 7.78 14.40 26.20 43.31 63.91 75.21 > 768 8.30 7.86 14.83 27.41 46.25 69.85 81.31 > 1024 8.31 7.90 14.96 27.62 46.75 71.76 83.84 > > > TODOs > === > 1. Multi-threaded folio copy routine needs to look at CPU scheduler and > only use idle CPUs to avoid interfering userspace workloads. Of course > more complicated policies can be used based on migration issuing thread > priority. > > 2. Eliminate memory allocation during multi-threaded folio copy routine > if possible. > > 3. A runtime check to decide when use multi-threaded folio copy. > Something like cache hotness issue mentioned by Matthew[3]. > > 4. Use non-temporal CPU instructions to avoid cache pollution issues. > > 5. Explicitly make multi-threaded folio copy only available to > !HIGHMEM, since kmap_local_page() would be needed for each kernel > folio copy work threads and expensive. > > 6. A better interface than copy_page_lists_mt() to allow DMA data copy > to be used as well. > > Let me know your thoughts. Thanks. > > > [1] https://lore.kernel.org/linux-mm/20240614221525.19170-1-shivankg@amd.com/ > [2] https://lore.kernel.org/linux-mm/20190404020046.32741-1-zi.yan@sent.com/ > [3] https://lore.kernel.org/linux-mm/Zm0SWZKcRrngCUUW@casper.infradead.org/ > > Byungchul Park (1): > mm: separate move/undo doing on folio list from migrate_pages_batch() > > Zi Yan (4): > mm/migrate: factor out code in move_to_new_folio() and > migrate_folio_move() > mm/migrate: add migrate_folios_batch_move to batch the folio move > operations > mm/migrate: introduce multi-threaded page copy routine > test: add sysctl for folio copy tests and adjust > NR_MAX_BATCHED_MIGRATION > > include/linux/migrate.h | 3 + > include/linux/migrate_mode.h | 2 + > include/linux/mm.h | 4 + > include/linux/sysctl.h | 1 + > kernel/sysctl.c | 29 ++- > mm/Makefile | 2 +- > mm/copy_pages.c | 190 +++++++++++++++ > mm/migrate.c | 443 +++++++++++++++++++++++++++-------- > 8 files changed, 577 insertions(+), 97 deletions(-) > create mode 100644 mm/copy_pages.c > > -- > 2.45.2 >