From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0FCF9C0218C for ; Mon, 27 Jan 2025 06:55:54 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 56F2A280118; Mon, 27 Jan 2025 01:55:54 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 51EC0280117; Mon, 27 Jan 2025 01:55:54 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3E63A280118; Mon, 27 Jan 2025 01:55:54 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 1FE66280117 for ; Mon, 27 Jan 2025 01:55:54 -0500 (EST) Received: from smtpin21.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id C37D88202D for ; Mon, 27 Jan 2025 06:55:53 +0000 (UTC) X-FDA: 83052321786.21.0A3E7D0 Received: from mail-pl1-f175.google.com (mail-pl1-f175.google.com [209.85.214.175]) by imf27.hostedemail.com (Postfix) with ESMTP id E580340004 for ; Mon, 27 Jan 2025 06:55:51 +0000 (UTC) Authentication-Results: imf27.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=SuhBbqY4; spf=pass (imf27.hostedemail.com: domain of rientjes@google.com designates 209.85.214.175 as permitted sender) smtp.mailfrom=rientjes@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1737960952; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=4wfQi+VRysSXzPv7Sv93reG91bbyjwdHfo545GyBzaY=; b=hVj0cusByXOw2NWRUY+0L9whOw1ru9y8V0a6SzJqUQ5+xpAjjJG+CtFztTeIZvP1An+O9o WSk84/fphRZFod1rj9VJvLPcLm69FMG5OVDajlCVMn2wNs5WZil4plcJluCljv3d6oeYm9 UbkK+FtYGbc2Hqvqmlu+LhO8YFBv+Qs= ARC-Authentication-Results: i=1; imf27.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=SuhBbqY4; spf=pass (imf27.hostedemail.com: domain of rientjes@google.com designates 209.85.214.175 as permitted sender) smtp.mailfrom=rientjes@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1737960952; a=rsa-sha256; cv=none; b=n/KO/5O3i8Vop6f6P4y09wnWxbJD/WxQFNJvJ66/iMfI83H/4uwLdiTAGFXicpY0rBYVft G5nyRvqFR/4OM+kwO5Ae6tU014WQGFii/KFxZk0RQ/IAVvGKWWa3jPJu4vba1QQ855Gz4w k91jqtfEIwOHFoKaPpKjzLhhTQiZCMw= Received: by mail-pl1-f175.google.com with SMTP id d9443c01a7336-21625b4f978so161035ad.0 for ; Sun, 26 Jan 2025 22:55:51 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1737960950; x=1738565750; darn=kvack.org; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:from:to:cc:subject:date:message-id:reply-to; bh=4wfQi+VRysSXzPv7Sv93reG91bbyjwdHfo545GyBzaY=; b=SuhBbqY4PlmCubBS/BO6YuZBQcFosXdfUUec/AqJC8l+uuKiOFYcejCNSGj3xbC0bv 8FVRdDuJyCCeeSktQard+rqxbicpnJjXAfPoNKpTaw5wwHOzmOacZ2b/WKCZwObsI8gl 7/6JhZlI8NiM7QANKR+vQBzAC4GoCIfFZ89MFueR20eonu+jvURPCdx1bXwT8ue+8Ehy TPvO4Q8s1WQrdrCtfPAUiwZ1TJ7x2pqiIMxJ8bWmolV+JEKlb782uUjG4BZ/Kew9uC1b L7+nBM7xKw1qSf0yS8mgqYXNW/363EUl/a1J+rEH+sKN6QU1yEVuh6NHLAi+8kKq3Lud UTfg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1737960950; x=1738565750; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=4wfQi+VRysSXzPv7Sv93reG91bbyjwdHfo545GyBzaY=; b=N7ws763QJxhGWHW+wyWj13CkAIh04fQSFI6Q5AOhVj9fF1/+bJxsqkM1Sp7nN2bBTJ aL2XIWXoiTiweovu7XwuWS4XA62XhYORl47DAlSjBz/Hoq6z5wjdVWu9c3iUJou+hrSV TiNHt4gLoKFiOINwKDLd6b+lYrT0WuLsCMuGIPQQJ5IGBRDuN/evJqySd69QtrPaev0D NeSHWUIRnDsdnzCloLzWvZ7LjTsTmTYv2LxUyym3s7f/spjrIKOAizTWRK6HaPGibF99 LM99/tq935xGrsQ2ByxYBZWsdAmEqBwffMrIrzhbrttmJkcCg9lhkfd4ztS3vFDH/xQR cAtg== X-Forwarded-Encrypted: i=1; AJvYcCWQrDOeexTpCszf8wfxRaRQj8kwNhlKvzdsuDZWCgAJjocEhxSOX1uZzCqP0aGM0Ze7+dnGYPwHlA==@kvack.org X-Gm-Message-State: AOJu0YykOegL+8U3X/YY5ZsR09+Er3MRVRRJhKKkra784G3gEt6M0ZAl htCl1AsXUyB2NloXnwy4IE/RStHcYuErTKu/I6xyGD4NzDKqJemRruREIuVC0Ws2q3w7yyxbUZg V4Q== X-Gm-Gg: ASbGncsL9sID3qUYSmnT2smXL7RwnyRLv2+WuDjtUFnU2QuuEsMMB/8t0W3arQlyOqC zLXZ2ydkEy3Er09W/oPv6zuEb7kqRBLH/J7Kr5a0pRelNBlRNUgRsuGb76UZ/9Dsnl0Me/1dZ1Y TKphjHQ6Ct4tV8B8R4UsZ7nDmJrpYmy11zTIz4Y/5hkYujPBuCgTO8eps42n7rWwB/AtaGgZSd9 DSEAUr9LavU7wLxY48rV0eY5wsAOddFsjt4KeKCjjc+F68w9QfjcYlS2zu/lfF1NtrJZqqJvnew R9f/2lmuVLFLASxB5+6Sos5ltTJreEMF8nc= X-Google-Smtp-Source: AGHT+IFEqfbeimL+xhM+fNb+Xy/kok4WLeJmsHgJYlD59tLRzk+6aTM7RWwdLPVJTkad1O4Oyg1sKQ== X-Received: by 2002:a17:902:eb83:b0:216:6ecd:8950 with SMTP id d9443c01a7336-21db0383784mr2600065ad.19.1737960950296; Sun, 26 Jan 2025 22:55:50 -0800 (PST) Received: from [2620:0:1008:15:a895:32e7:423e:b2d8] ([2620:0:1008:15:a895:32e7:423e:b2d8]) by smtp.gmail.com with ESMTPSA id d2e1a72fcca58-72f8a77c882sm6520994b3a.146.2025.01.26.22.55.48 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 26 Jan 2025 22:55:49 -0800 (PST) Date: Sun, 26 Jan 2025 22:55:48 -0800 (PST) From: David Rientjes To: Shivank Garg cc: akpm@linux-foundation.org, lsf-pc@lists.linux-foundation.org, linux-mm@kvack.org, ziy@nvidia.com, AneeshKumar.KizhakeVeetil@arm.com, baolin.wang@linux.alibaba.com, bharata@amd.com, david@redhat.com, gregory.price@memverge.com, honggyu.kim@sk.com, jane.chu@oracle.com, jhubbard@nvidia.com, jon.grimm@amd.com, k.shutemov@gmail.com, leesuyeon0506@gmail.com, leillc@google.com, liam.howlett@oracle.com, linux-kernel@vger.kernel.org, mel.gorman@gmail.com, Michael.Day@amd.com, Raghavendra.KodsaraThimmappa@amd.com, riel@surriel.com, santosh.shukla@amd.com, shy828301@gmail.com, sj@kernel.org, wangkefeng.wang@huawei.com, weixugc@google.com, willy@infradead.org, ying.huang@linux.alibaba.com Subject: Re: [LSF/MM/BPF TOPIC] Enhancements to Page Migration with Multi-threading and Batch Offloading to DMA In-Reply-To: Message-ID: <3b59ea3e-04db-ad38-97b1-20cff0f8f17c@google.com> References: MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: E580340004 X-Stat-Signature: b7iskdm5w7i14g9hni7rucptxw33w8f7 X-Rspam-User: X-HE-Tag: 1737960951-832914 X-HE-Meta: U2FsdGVkX1+sr9vYr4Pu0l4ZW7hSOP+TQPPRYxWUlIqCAfmiyOuBFqOUl7gGi81F+UNtJ5fRm1t9Rrf5ItEDIo8CNSKZ/GfkE+K7p+TIwQCss9XG6y3qh16/feXhZA/Z4TUa12fa1ElfUnTkpIJImiatXOtvyF1f4mHMo92vFGJTtmN1igDZPWBF2XQTFZVE65ePrCKVsD7rUYYiEZWl5KQ3Z2Fu/nuySPX5sO3vdRKimoeKkMzEWMPWKhzg7rAEIUrmn6QdeoIn8Gkgb+GMjZcDrdxtzHXAY7rg294Fg3EhjAniGWUDoWC2QHuO8wBOOR8ALQaB+mflO1cHsQT+5CVQuExPGYRG4JhOhMh1nUBoi0/LmzAn6G3tg0kMaAq22924lcKlrgdgAbDQ8VeS6x8euiV7IgKAYhspj4tXohw5Gorkv8IZCfo9i5cqwT1iLKSdnOrM1JvH7/3dtPVYWQaNCZE7ZJNwjPTisgQm8yv/DDPwqk0nVYBdTwI0ktgAUimONdcw6UfY1JcunMObW9Gq1d+MuqaduAjfGU7Z6+8h5F/N32xlSrRi/oFnNH3w+pTbLbpoiTAgwAe1hTxmSH2qxtissiiqXpWlfDgwU0Y8pg8VyAuEviY/XupOsniW33pWItW9MkTSDxPjVCtVVK3VYr0PGcGZcRVU31MO8ZU8XMuuA3r01/PPDY+RV8zaCSmDo3KgPiKndUXM3wUJW8GdXN614iNcmGjJLcMmOQRAzlK1GBa8QKaakZ3E9Td39ipKUptLb2pVGnq6pnsvxKHN7q49J/r94x+xbu+utPm6raDrmVFWmTrimpyMK141s7e0mk9klcERUQcEu2XygWrU7fJGr2AKYkdXGG69vQ6IsKGYfAWqVhTMo3NPeMI/4yoA6chVImeJtMKIsMyCdzzEFOmIf1+WhYsga/KJXdzo+Sw/HH1VcGQR+nhvniO6Pb1XpTU7ysPbfz1YPB1 bQcB33eu GKJLXPICJu50ElI5aoQ0JRrmQpTY3J0SmG6oVzxnWmbD3lottw9hHeYxqqV/pqv7vw8Xf6wDAqK4MHx5rzMZOjUgQbcqQ4b37qLWfHe3GiYvs1v+sBxvKY1smRno1tV1ARaOz533Bm3sKttR65f37CGtH6DHBBQWRh6K1CIesFmyhBJu6E8K/4D9mjTQasPVX3WLaqLfvlPkOcIqEEDihHBp7zDQL8T/jdkmzhXlumxsIjvSgD6jr7ZdwT+jt763ByJXZwkioEvf5xHy4AL73ZPGJxl5uKe8i6JzcntXE2M8WnUpeQiO3igUwOJgfouQ1tfanaLme9X9XlR6coSgLEKtLxyz6zHbbiDmqzfyZKN5qycvL9AwDQoX5oath8dKynNJcP7bazsGtwuiNh0po2c1H5qYa5J5SSqNq5ALuEdTwIYuoej5CvebLezXIY4EDVGvr6fD6CMjpCLFCn8NLoaZ5bwYWL5WHRrchAz85R3q8bCeAeJXPl7SXHSqGsz5Vs3HoJcOClu9TTxnn2MijB/oJsg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000366, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, 23 Jan 2025, Shivank Garg wrote: > Hi all, > > Zi Yan and I would like to propose the topic: Enhancements to Page > Migration with Multi-threading and Batch Offloading to DMA. > I think this would be a very useful topic to discuss, thanks for proposing it. > Page migration is a critical operation in NUMA systems that can incur > significant overheads, affecting memory management performance across > various workloads. For example, copying folios between DRAM NUMA nodes > can take ~25% of the total migration cost for migrating 256MB of data. > > Modern systems are equipped with powerful DMA engines for bulk data > copying, GPUs, and high CPU core counts. Leveraging these hardware > capabilities becomes essential for systems where frequent page promotion > and demotion occur - from large-scale tiered-memory systems with CXL nodes > to CPU-GPU coherent system with GPU memory exposed as NUMA nodes. > Indeed, there are multiple use cases for optimizations in this area. With the ramp of memory tiered systems, I think there will be an even greater reliance on memory migration going forward. Do you have numbers to share on how offloading, even as a proof of concept, moves the needle compared to traditional and sequential memory migration? > Existing page migration performs sequential page copying, underutilizing > modern CPU architectures and high-bandwidth memory subsystems. > > We have proposed and posted RFCs to enhance page migration through three > key techniques: > 1. Batching migration operations for bulk copying data [1] > 2. Multi-threaded folio copying [2] > 3. DMA offloading to hardware accelerators [1] > Curious: does memory migration of pages that are actively undergoing DMA with hardware assist fit into any of these? > By employing batching and multi-threaded folio copying, we are able to > achieve significant improvements in page migration throughput for large > pages. > > Discussion points: > 1. Performance: > a. Policy decision for DMA and CPU selection > b. Platform-specific scheduling of folio-copy worker threads for better > bandwidth utilization Why platform specific? I *assume* this means a generic framework that can optimize for scheduling based on the underlying hardware and not specific implementations that can only be used on AMD, for example. Is that the case? > c. Using Non-temporal instructions for CPU-based memcpy > d. Upscaling/downscaling worker threads based on migration size, CPU > availability (system load), bandwidth saturation, etc. > 2. Interface requirements with DMA hardware: > a. Standardizing APIs for DMA drivers and support for different DMA > drivers > b. Enhancing DMA drivers for bulk copying (e.g., SDXi Engine) > 3. Resources Accounting: > a. CPU cgroups accounting and fairness [3] > b. Who bears migration cost? - (Migration cost attribution) > > References: > [1] https://lore.kernel.org/all/20240614221525.19170-1-shivankg@amd.com > [2] https://lore.kernel.org/all/20250103172419.4148674-1-ziy@nvidia.com > [3] https://lore.kernel.org/all/CAHbLzkpoKP0fVZP5b10wdzAMDLWysDy7oH0qaUssiUXj80R6bw@mail.gmail.com >