From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 86C14C0218D for ; Tue, 28 Jan 2025 06:55:01 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E87CA2801DB; Tue, 28 Jan 2025 01:55:00 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id E37492801D6; Tue, 28 Jan 2025 01:55:00 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id CE2A42801DB; Tue, 28 Jan 2025 01:55:00 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id AC3AA2801D6 for ; Tue, 28 Jan 2025 01:55:00 -0500 (EST) Received: from smtpin06.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 1891F80C73 for ; Tue, 28 Jan 2025 06:55:00 +0000 (UTC) X-FDA: 83055948360.06.98FE9E6 Received: from NAM10-DM6-obe.outbound.protection.outlook.com (mail-dm6nam10on2084.outbound.protection.outlook.com [40.107.93.84]) by imf04.hostedemail.com (Postfix) with ESMTP id 26C2A40008 for ; Tue, 28 Jan 2025 06:54:56 +0000 (UTC) Authentication-Results: imf04.hostedemail.com; dkim=pass header.d=amd.com header.s=selector1 header.b=eLEaydWt; spf=pass (imf04.hostedemail.com: domain of shivankg@amd.com designates 40.107.93.84 as permitted sender) smtp.mailfrom=shivankg@amd.com; dmarc=pass (policy=quarantine) header.from=amd.com; arc=pass ("microsoft.com:s=arcselector10001:i=1") ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1738047297; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=qSdaj7wE1heses0SwKM6w/XTZKFMyQKIaxTEkfMygqQ=; b=cJk5qFaEgPWJ5l8qNtTGQCQGEkRp3xQXj5Faw7LdxWDuU0D5aE0yHaSDECxR6parsrjkag 8UhiM97L/Gg+h55pSJnklXpGBwUMGpi4017a0BV5r20uOCvvPAv7ESVfxnhyC2uMAh8NOt ajhmwr1rCpT0O+kKbfD+EvQzEuQT/0c= ARC-Authentication-Results: i=2; imf04.hostedemail.com; dkim=pass header.d=amd.com header.s=selector1 header.b=eLEaydWt; spf=pass (imf04.hostedemail.com: domain of shivankg@amd.com designates 40.107.93.84 as permitted sender) smtp.mailfrom=shivankg@amd.com; dmarc=pass (policy=quarantine) header.from=amd.com; arc=pass ("microsoft.com:s=arcselector10001:i=1") ARC-Seal: i=2; s=arc-20220608; d=hostedemail.com; t=1738047297; a=rsa-sha256; cv=pass; b=lwflqfNKhn8i2bJqnAgnMq9eW9iEWiJ5bvKwAGhZ3ZCanPDwHM3ONnbQ9pX8C08HgzGo1i uyyxag8K34AB6d2S8DOaePKTpUFq9eXue9x28LSVU//HxQdQXhCc9r70lOMQBZA/4Xu84e Ko8O65eTeF7gZorVpMW6EJxuGR6+zmE= ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=K2WgIA0ujfiz84qDrOpXoXt5UA5SflwW5E5b9WeMFGl6Kk+k8EXMFH/QEtxyHZakbZb5zYGlOtQ0LBDmuxOYu+kegQhISugv0jRsJmPs+6x8tnyAkwMRKa8/Y62giBVjndN5MYX1mhIRzen8P4ASe1HfocvyB94RxFZU/qhOVXo5b3NGV+ExPOc6u0/qJVvW6jrBk5Dms4zdsTYSHRo8tZsqyYOyCJY5BrEwrTL/TyA2NVAJRCRU4WaVpSnIvCUG86TdvksgFc+DUIfb+IMic6PS9EYiPHSkO5sa1EBt2NtSf8pp2cZoh3UOA1YMzbFmlhu7FKKXruhyapegsoagcw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=qSdaj7wE1heses0SwKM6w/XTZKFMyQKIaxTEkfMygqQ=; b=zGwZLNqvPdsCZxBPbrErRgXdQ4kh6/m70hiKhDWrf3/VddUiQeBge3et5e99jecBttdTjX21USFYeXqQC1qRDSZyQlvgDZs+U2vvj9TXGsyoqFrBqOinLB/eW1mzfCGSqJoZerYkGE4ZRSFgkMknTHREGwIBlku97sgd2u9/kTKZPlIxij2cb2kJiEvP/kYVxCgYgFvjC/PICuyVEi1psr6r+NxchXacZ1ihi//PMaTEhjFMyRnSm1xx5n7DW9BzzbBW5mzHeXBBFxnl2GgziwyAxqC9ywSYYz6v2z9VgjoThZj67Vd3f1fWGXGWLyyMK+HlNA0thGNwXbxLlng1MQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=amd.com; dmarc=pass action=none header.from=amd.com; dkim=pass header.d=amd.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amd.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=qSdaj7wE1heses0SwKM6w/XTZKFMyQKIaxTEkfMygqQ=; b=eLEaydWtZa8PWtLP/nLvvLvNvEZQWphA16LCHSSIHQbyJN778UvCdtcjQj2yonwvNcxpQ2A4mPizY0Hrv7LqKVhoFZMw+x2f3JysHcweTxyJF8IWWabcERtmGBHUv0GHfv+/SYCSVwcwxfCWAdLv4aQoaT6RMiFZJ1yC5BR2y5w= Received: from MN2PR12MB4270.namprd12.prod.outlook.com (2603:10b6:208:1d9::21) by SN7PR12MB7228.namprd12.prod.outlook.com (2603:10b6:806:2ab::6) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.8377.22; Tue, 28 Jan 2025 06:54:54 +0000 Received: from MN2PR12MB4270.namprd12.prod.outlook.com ([fe80::2e50:d5b4:45f2:684d]) by MN2PR12MB4270.namprd12.prod.outlook.com ([fe80::2e50:d5b4:45f2:684d%3]) with mapi id 15.20.8377.021; Tue, 28 Jan 2025 06:54:54 +0000 Message-ID: Date: Tue, 28 Jan 2025 12:24:32 +0530 User-Agent: Mozilla Thunderbird Subject: Re: [LSF/MM/BPF TOPIC] Enhancements to Page Migration with Multi-threading and Batch Offloading to DMA To: Zi Yan , David Rientjes Cc: akpm@linux-foundation.org, lsf-pc@lists.linux-foundation.org, linux-mm@kvack.org, AneeshKumar.KizhakeVeetil@arm.com, baolin.wang@linux.alibaba.com, bharata@amd.com, david@redhat.com, gregory.price@memverge.com, honggyu.kim@sk.com, jane.chu@oracle.com, jhubbard@nvidia.com, jon.grimm@amd.com, k.shutemov@gmail.com, leesuyeon0506@gmail.com, leillc@google.com, liam.howlett@oracle.com, linux-kernel@vger.kernel.org, mel.gorman@gmail.com, Michael.Day@amd.com, Raghavendra.KodsaraThimmappa@amd.com, riel@surriel.com, santosh.shukla@amd.com, shy828301@gmail.com, sj@kernel.org, wangkefeng.wang@huawei.com, weixugc@google.com, willy@infradead.org, ying.huang@linux.alibaba.com, Jonathan.Cameron@huawei.com References: <3b59ea3e-04db-ad38-97b1-20cff0f8f17c@google.com> <520F7E0B-E0B7-4A84-9046-B8B5FC6EA9F7@nvidia.com> Content-Language: en-US From: Shivank Garg In-Reply-To: <520F7E0B-E0B7-4A84-9046-B8B5FC6EA9F7@nvidia.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-ClientProxiedBy: BM1P287CA0001.INDP287.PROD.OUTLOOK.COM (2603:1096:b00:40::21) To BY5PR12MB4259.namprd12.prod.outlook.com (2603:10b6:a03:202::17) MIME-Version: 1.0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: MN2PR12MB4270:EE_|SN7PR12MB7228:EE_ X-MS-Office365-Filtering-Correlation-Id: 71c13278-5e02-42d8-4f8c-08dd3f68aa14 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0;ARA:13230040|366016|1800799024|7416014|376014|13003099007; X-Microsoft-Antispam-Message-Info: =?utf-8?B?VHZsNkRXS0prQ2ZxZnhpYng0cEgwLzJCN01DUjFqdlRRdWFXSEUvcjRnZFVz?= =?utf-8?B?QnJiTVVlbkg3N2g1UTcyUlZVZEtEYjhvZ2JsWUwrUElNb0l5S1B6STQ3M2h1?= =?utf-8?B?L0Ryb3p4L2ZVcDY5UGRBZC9KT1NaTXpvanhhR2ZHdVg4cmZNbCtNWnV6ZTZL?= =?utf-8?B?dElHWFAxV2hRdnQrYWpXQjB2akdYMC9NRm1oSjVJeUVEa0lzMGQ4SCtzSi9P?= =?utf-8?B?dXFHOThsbjdYVkczK1dnaTRMejRpNGpaVTBuazUxMloreUx1S0VhN004OG9T?= =?utf-8?B?a2ZYbmoxbEYrZ0RQWms4cndBRVY2QW80QXJvWVVrZTJhemdCMERpbitDeHIz?= =?utf-8?B?NXo1T3gvWnJtcUFjcW04b3ZLbFprWUtOazNnMXJsWWhteUNSZDlId3RQVzQz?= =?utf-8?B?MGhMSGh5aVhwTzI3ZjN6VTRSZFFUdGFFR1dhVWlQT1Qrc1lsTURxTHVWT0pQ?= =?utf-8?B?RTd2L0xqWGc5WTlCcS9JcU1kWW5LbVk4S1kwdTN6MUpHUmdsdjl5MHNGYWdF?= =?utf-8?B?Y1pRT3lDS3pRL0ZLWDBNK3lsYzRuY3hMSDA1a2FzV1FVU21jNHlKd3lKQXJ4?= =?utf-8?B?UGRCdkdvbUpqaUNHblRLamw3TENDd011ODk1OEo2Ym1rY3plTGFwNmxNb1RB?= =?utf-8?B?Q0VDU2g3NjA0UklzSmJkOFRCYitIeXY2dXRoQVBIVnppRi9yZGxJSW44bU9a?= =?utf-8?B?S3diSXhBTEJ0UTNzWlRORWw2N2tvUUNWbXdaOTI1bGgycjROWXZQOFpreVBn?= =?utf-8?B?TGNXejdxWW04elYzQVZGSlA5azR1bmZwU2xBYSs0MFJTdU03cUNmS3hhZ1hF?= =?utf-8?B?R2wrUnRqN2NFNzRIdmZ1cDBRUnAva3Y3NUNyc0NhZUlQbW5rTVFmWnVUWll2?= =?utf-8?B?eE8xQlNUSkpuTDFkRE1KdlpwVWFaWEloZGprTXlTbWZQNkxhUDJlb3RhUk1n?= =?utf-8?B?QnZ6VUx2RVI5YlpKbzJIUlk1Q3dvbUNCV2tka3VkdTBNRWhkRTJScUxZS1pB?= =?utf-8?B?M1NaSUY0T1ZQdllNanAvdmRZVVBzaFBtd3pXN1R1cGQrN3hzM29sVnpIUm5t?= =?utf-8?B?aFNNRUsyN2I3UjljQkVCTjJYRkRrb2k0VzBmUVpyUkNJb25NTmhLRW0wdFN3?= =?utf-8?B?M1RXNGU5QS9Ubk90eFIyTXV6K0NzQ2Q4L05Pc1RBWWpnU1plZzZBM1lQNkxw?= =?utf-8?B?SDBWRXJ4QWxjTmE4NDgveU04WTRwUWd3Y2JaeFM1Z2l3NWIvVk15T0dKRTZv?= =?utf-8?B?VDhiREY0QjBMZ1gxbFR1dGZ1ZkUvcUVkdndDR3ZXU0h2Wjg1K1dwSlBpd0Jr?= =?utf-8?B?SlhyQ1FxYnE0eUsyOTZSdzl5WGxRdXhaMUNqb0x5NHBNc1lublFTUU9WOG9v?= =?utf-8?B?Z1FueGpDWGROcjVkK3pUdGp3RmlGSjQ0dG11clBIU1FXa2ZhZVk4dlQ3bXYv?= =?utf-8?B?SmdXL0NLTnRyUzVGU29ldVJyM2pXRXkzR213bmkzYW5IRkxwM1pEVGRxK3ox?= =?utf-8?B?c1NaczBNM1BIYmdCU2RUMlFud3RBd0pDeHVibnBUamJpTkovdEluNFo5eWY3?= =?utf-8?B?U2xFRHNSRHZVRjdORnJXZHcxQ1FXNUdHNTdRT1h4TFlueXZONytBVkFGYndh?= =?utf-8?B?TjZEWjFJT0U5dTRiaXRpUG1VVW04ek12Sm5RMHhWSkF3RmlnTUJ2cmw3TnE3?= =?utf-8?B?VkIzdTF5Nk5qZ3g5QmFScFI3S2toKytuZ0dReVUxTmdsZk9wZDNFS0FsOFNu?= =?utf-8?B?d0REU3lBaFlVUFZ4RmJPSmczTUlsSVl6UzZaWWRoOEw0bGpjUmlXYXdlS3VK?= =?utf-8?B?ZGM5VkdjTWgvdk5KY2dVUT09?= X-Forefront-Antispam-Report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:MN2PR12MB4270.namprd12.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230040)(366016)(1800799024)(7416014)(376014)(13003099007);DIR:OUT;SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: =?utf-8?B?SFVJWXpEUGh5UVZFdDNybmhxWHFSK1VqcWpnMUNUUWdGNTlWemMwaWRQL3BM?= =?utf-8?B?ZlpEeHQwTFA1YmhDWTRyL2dzNHRPMWcwZmxTQlJqR1UyTjQyaHRTSDFoK0lW?= =?utf-8?B?c1lLdklzWFlmYjE0OEdyUllLNWZSeWsvTHR5UHFlTCs5c1dNcnJXTjhjN0xp?= =?utf-8?B?MzlHUzljSDl3endzb0RJYnZXMTNXamgrelZVeEtQcjdPZDM5ZFpuKy8vWHJl?= =?utf-8?B?dksra0ZocUFxYkVIajZ0WS9EZlZjRUZSeW5ZRGZISmE2UllHUS9ybXo3YlJW?= =?utf-8?B?WHc1TXdaNnNhVnR0SVdKNE13SWR4R2VMSDhGeDZKdDZ3RERRNlBBRHBWZmd5?= =?utf-8?B?bFljOS9rZm9teUJ6Wkk5bXdTQkdnanBJRjhQanFFZGNqaXhnODZBYkdGZWJD?= =?utf-8?B?SEJTRFZSczhrOEpKZFRzZDNqeitPOFhnMGtqWXVGYWgxM3hBcVhFZDV6am9o?= =?utf-8?B?QTd0aG42VE01RFFFZzFkekVDYnhlTC9TaTJrZFg0aU94YVQ3bjBIRnc0TzF1?= =?utf-8?B?eklTV2huRkNxNzJUa3hhbkZMTnpLSXQySnphcUNIRjdxQUs3Z2laTTZ1UlA0?= =?utf-8?B?THc0VTRiSnVObVJocTJ2QUNKTDhEM3RWKzdNZ1VnWFQ0YS9sYkNsZXJXci83?= =?utf-8?B?cmg4NTZ3QkRiWHRrUXFxYmlOMEZ6WFYwaEZqM2xkVjdTblU3bXg4cGdYVEo3?= =?utf-8?B?Vit1ZW1EUEhaZ0dKQ0tDWEFqR1Y1WHgwb0thUVRXSjBmWkNxVkNPS3U1bU4y?= =?utf-8?B?bUZOQVdkd0VSci9JSm1PSEF2ZnluRXZiZlJQeXBpTnRZYTFGYmdKNk1kVElI?= =?utf-8?B?Y1JuZngxSUJqNEpmU0JWNE9hYWJoc0xCWlJjSEgzWm43T08rRFhJSnFyelNa?= =?utf-8?B?Zm9EVWJmY0ZicGRqVmV4dU0raDkvTmYzN1NVdTIrSlhiaXhmVVVNZUs4Rkg0?= =?utf-8?B?RlozY1lFRWdNSW1xeWNOY0I2bE0rckxtaVZSL1lBNUdVeW12bGJyWVgzVmRB?= =?utf-8?B?M1Zlb3k0dVJPUFUxQ0NzeGlEZzNPWUxtQ0FlWEdIT2lNSjU3MGU3TkpHcitP?= =?utf-8?B?Q1JDcWFFR0hoOHgzWE00L1BDWnZZQnFuR0VtcFB2RzRtVzd1YkJqeFpJQmZi?= =?utf-8?B?WHZvVGdNQTBBWnFhbCs1MWdQVHBzczhwNW1DVDhnL1NwZXBUOWxPdFlSM0FW?= =?utf-8?B?ZlpSQ3ZYK2ZiUFVoWVJDa2tUamlqOGNIOUhKeWtXbGtLQ1l4RWtoSHZ2MHNr?= =?utf-8?B?RGFZMXFtUmZRL0FKT212Q1huUGVOcTFNT1YwVkhuZkhrbm9SSEJFMXQ5T1NV?= =?utf-8?B?aURTcGlhK3pUNzRrYlIxY3pqT0JVNVhUU3NuUU9tR2YwVWhyb2tSYmlSWWdZ?= =?utf-8?B?WGd1YlFFeTFnbzRES1lueGx0VGdqNG52dllSR0E4RWkyWk9KdDV3emgxQ1RQ?= =?utf-8?B?Ull0UEhVckFRNmpWcFlmcXRoRjFQWk1OcGtBLzJJeGErOWo5KzQ0bXRiKys0?= =?utf-8?B?TVU2SEpmelRrcGNhaHBSWlVKeU9Pa0N6cTlRazJmV1RIMFdycSttMWJscnI2?= =?utf-8?B?em5EUmxUTUoxQzExeUtkZ05uekMyQnVyWEhEL2NBeWhjZUtlei90eHUvUzB1?= =?utf-8?B?T3BaZ3d0c0wyem5QVStEb3RYcTlsTzVwZEo4RzhlVzFJS0lwVHErT3U1RXY4?= =?utf-8?B?RjBLTGVDRjRzRXJ1c3B2N3M1OTU2aTdLajhYbEVZN05ia0JNbXBkKy9pUDBQ?= =?utf-8?B?U3JTTWVzamdoblFuSitsRXppbFpUZzVsR3RUamZ4K3RtcDdkUitndm51cEY1?= =?utf-8?B?S1JxQzFhbHNyYUZLaHZ6c2Y2VjF6NjJ6dUpwQSt1UkQrVzBYMythS2VhclpR?= =?utf-8?B?amdnbTliK3JDdGg1MEladU1LZjU1QVdib3VjaE9nems2ckpqbDNnSXlEbkpj?= =?utf-8?B?YVVVOW9qUHBqUUo4bXV3ZVQzd2E2YlFxZ2VCeUtVWGpjeDYxTDc5UkdwNEF6?= =?utf-8?B?WThETFJ5dDIwQStiSzZTcGtzbUFzSS9YN1VMTHl1Rlc3ZFpma0tTV2d0NlBH?= =?utf-8?B?empNV0UxQUZ6RXhtbkpVWHNYaDIxN1dDd0R4ek9XNkdMd0daKzc1KzJLNWxU?= =?utf-8?Q?QKGTNePA/M5Ek+s5wm6pRY9be?= X-OriginatorOrg: amd.com X-MS-Exchange-CrossTenant-Network-Message-Id: 71c13278-5e02-42d8-4f8c-08dd3f68aa14 X-MS-Exchange-CrossTenant-AuthSource: BY5PR12MB4259.namprd12.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 28 Jan 2025 06:54:53.9225 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 3dd8961f-e488-4e60-8e11-a82d994e183d X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: pfGwzjuxoCbQiBU8dNf2JJ83r2OsGbS4ew+8rxjeeyAbuCDTOmXxX0l60uTTiBetyUPSNv8D2KyUFlnL/+ts9g== X-MS-Exchange-Transport-CrossTenantHeadersStamped: SN7PR12MB7228 X-Rspamd-Queue-Id: 26C2A40008 X-Stat-Signature: rgofjx57tnwah7hbxp4g58zkcesiqce6 X-Rspamd-Server: rspam08 X-Rspam-User: X-HE-Tag: 1738047296-694613 X-HE-Meta: U2FsdGVkX181FoFer+E5Y/2pcZv8kIkK0SHjrDZoxHRQFyPrTm9NjWHZCaHTQpFIwAeLyhqVp+G0Tt/dth7Itkze4c2MREsA0FsaG+JjkuY5XLOMtZ8DOornprNG4KuxE6Q4H/vymch27UVbO6uTiZcZ+nTnHdYI52fKlA77r1kUSLUYsP7E8wmkIIquJk2vCiMYE8oLOTKTTs3S2blyiz2X+OXWrRNM9x2J4yTwwkfFmz5ODKxPbb85LAI4CO9oyg+NhkcfyiYTtT4BIV6zs/mvxBYOJEo8/kxacbCL/Fb4Lt2PY8Z1v1jUQiEAEjOOmwdm3dNBIplKQzqlRANeINsQK2Awt2AGlH9scn3EiQfdv4L990JL7+sWu4Y4Y3pVI0BzKCSyA0OASA2aEBpnxlodTgMOdF5sVw6GWb8U04bMZ4F8sENLtiGCfRE7wVstpuOgaltgVNFQQdeZ2iv9xALcjJ5vUzl0IGeL5ydqkErOx/fw9rgaA3SBrOvSp4SLHLVmdxJISxfL2oDsktV+xTBJJv+lQ8IPf8bY1bba9v5h46pALegCoGyCwPuZm9RDxzOYOnDXayLz/Cl9/9NnJDIedIxk2AO1ee8nqWVcHIKw+URk0tZvz756C/0zca0rymt1G1F4c3b8CFM+5gd4RlW6a3JhI68svmsJWOVeCjGZkwTvhlinE1hn6TqgL1hm+l738rWQQTt0f4WvRt8ojlG+1ViU0OVPQx+tlOIR4CkRcAi/6lKetKj6nM03QYRlHy8P1Lj2XtSu+oDuGS/c2HzdRFkQPQHFVY3TaYrceQ3RD4BUJMgf5yAHN0JJFSUx9i0/D0DyX1FtMagv9xtoySOo41vdr7YS9PH4F956DdrqQGaKEFzSFaArQNB3dUsh01akv/yW52Roqa3NCnHPdERkBnxccyXtgtEiF2SL/hRPrmnU90Ss/1N2hVANVZB0t5QNYKTiNxqyGZnKYyh E9ofPV3X 1n3CthvBXTffiu3GNlz3dt308Q1yajzAjtJqNaXAG3DcMYUwj3g3c+zgG8Bs3P30HwdoG6Fs9ki63M29+n8KzmLomlq6ssCoOd01AlReXLYtI4cOLRgZaBlnwvYlVPhcHF6mSgdO3UvQNPf9MUJLQY1neunbZoCu1QSXXP069i2Qej0tMfPmhfZFKKaWwqb/Qvqsd7jaczO23cYTOJd3FjTBhLK5O3+rMBFW5VDL/AM7rbVvL5e7lJZfTrdFQ5viyzi86q7xcaYo/m6jjeuU588No8pqdLUlEcKzPzdPSjzmpId6m6daEfQ1d7srsO6fWZl74WT2ZQ1PlPtA/4EYJek9EpVuFVcXVaQZ/pqfKdVGCLX3KopKQKUeXtYF75b3mo70RNFx/EDtc8UP9rh/ZpetXO6dInfyyDEvol0jLjbIw6ywgbCITkt8ck2iDgbuoh+Q9rQDW7cI3vltIuLrKHerxRhz79VoKMxkdk0mhcuwMnCeiNfX+47YMdIuegH26cuXv51esYONaesebPmCZlpNbfA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000050, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Hi David, Zi, On 1/27/2025 6:07 PM, Zi Yan wrote: > On 27 Jan 2025, at 1:55, David Rientjes wrote: > >> On Thu, 23 Jan 2025, Shivank Garg wrote: >> >>> Hi all, >>> >>> Zi Yan and I would like to propose the topic: Enhancements to Page >>> Migration with Multi-threading and Batch Offloading to DMA. >>> >> >> I think this would be a very useful topic to discuss, thanks for proposing >> it. Thanks for your interest in our proposal. >> >>> Page migration is a critical operation in NUMA systems that can incur >>> significant overheads, affecting memory management performance across >>> various workloads. For example, copying folios between DRAM NUMA nodes >>> can take ~25% of the total migration cost for migrating 256MB of data. >>> >>> Modern systems are equipped with powerful DMA engines for bulk data >>> copying, GPUs, and high CPU core counts. Leveraging these hardware >>> capabilities becomes essential for systems where frequent page promotion >>> and demotion occur - from large-scale tiered-memory systems with CXL nodes >>> to CPU-GPU coherent system with GPU memory exposed as NUMA nodes. >>> >> >> Indeed, there are multiple use cases for optimizations in this area. With >> the ramp of memory tiered systems, I think there will be an even greater >> reliance on memory migration going forward. >> >> Do you have numbers to share on how offloading, even as a proof of >> concept, moves the needle compared to traditional and sequential memory >> migration? > > For multithreaded page migration, you can see my RFC patchset[1]: > > on NVIDIA Grace: > > The 32-thread copy throughput can be up to 10x of single thread serial folio > copy. Batching folio copy not only benefits huge page but also base > page. > > 64KB (GB/s): > > vanilla mt_1 mt_2 mt_4 mt_8 mt_16 mt_32 > 32 5.43 4.90 5.65 7.31 7.60 8.61 6.43 > 256 6.95 6.89 9.28 14.67 22.41 23.39 23.93 > 512 7.88 7.26 10.15 17.53 27.82 27.88 33.93 > 768 7.65 7.42 10.46 18.59 28.65 29.67 30.76 > 1024 7.46 8.01 10.90 17.77 27.04 32.18 38.80 > > 2MB mTHP (GB/s): > > vanilla mt_1 mt_2 mt_4 mt_8 mt_16 mt_32 > 1 5.94 2.90 6.90 8.56 11.16 8.76 6.41 > 2 7.67 5.57 7.11 12.48 17.37 15.68 14.10 > 4 8.01 6.04 10.25 20.14 22.52 27.79 25.28 > 8 8.42 7.00 11.41 24.73 33.96 32.62 39.55 > 16 9.41 6.91 12.23 27.51 43.95 49.15 51.38 > 32 10.23 7.15 13.03 29.52 49.49 69.98 71.51 > 64 9.40 7.37 13.88 30.38 52.00 76.89 79.41 > 128 8.59 7.23 14.20 28.39 49.98 78.27 90.18 > 256 8.43 7.16 14.59 28.14 48.78 76.88 92.28 > 512 8.31 7.78 14.40 26.20 43.31 63.91 75.21 > 768 8.30 7.86 14.83 27.41 46.25 69.85 81.31 > 1024 8.31 7.90 14.96 27.62 46.75 71.76 83.84 > > > I also ran it on on a two socket Xeon E5-2650 v4: > > > 4KB (GB/s) > > | ---- | ------- | ---- | ---- | ---- | ---- | ----- | > | | vanilla | mt_1 | mt_2 | mt_4 | mt_8 | mt_16 | > | ---- | ------- | ---- | ---- | ---- | ---- | ----- | > | 512 | 1.12 | 1.19 | 1.20 | 1.26 | 1.27 | 1.35 | > | 768 | 1.29 | 1.14 | 1.28 | 1.40 | 1.39 | 1.46 | > | 1024 | 1.19 | 1.25 | 1.34 | 1.51 | 1.52 | 1.53 | > | 2048 | 1.14 | 1.12 | 1.44 | 1.61 | 1.73 | 1.71 | > | 4096 | 1.09 | 1.14 | 1.46 | 1.64 | 1.81 | 1.78 | > > > > 2MB (GB/s) > | ---- | ------- | ---- | ---- | ----- | ----- | ----- | > | | vanilla | mt_1 | mt_2 | mt_4 | mt_8 | mt_16 | > | ---- | ------- | ---- | ---- | ----- | ----- | ----- | > | 1 | 2.03 | 2.21 | 2.69 | 2.93 | 3.17 | 3.14 | > | 2 | 2.28 | 2.13 | 3.54 | 4.50 | 4.72 | 4.72 | > | 4 | 2.92 | 2.93 | 4.44 | 6.50 | 7.24 | 7.06 | > | 8 | 2.29 | 2.37 | 3.21 | 6.86 | 8.83 | 8.44 | > | 16 | 2.10 | 2.09 | 4.57 | 8.06 | 8.32 | 9.70 | > | 32 | 2.22 | 2.21 | 4.43 | 8.96 | 9.37 | 11.54 | > | 64 | 2.35 | 2.35 | 3.15 | 7.77 | 10.77 | 13.61 | > | 128 | 2.48 | 2.53 | 5.12 | 8.18 | 11.01 | 15.62 | > | 256 | 2.55 | 2.53 | 5.44 | 8.25 | 12.73 | 16.49 | > | 512 | 2.61 | 2.52 | 5.73 | 11.26 | 17.18 | 16.97 | > | 768 | 2.55 | 2.53 | 5.90 | 11.41 | 14.86 | 17.15 | > | 1024 | 2.56 | 2.52 | 5.99 | 11.46 | 16.77 | 17.25 | > > > > Shivank ran it on AMD EPYC Zen 5, after some tuning (spread threads on different CCDs): > > 2MB pages (GB/s): > nr_pages vanilla mt:0 mt:1 mt:2 mt:4 mt:8 mt:16 mt:32 > 1 10.74 11.04 4.68 8.17 6.47 6.09 3.97 6.20 > 2 12.44 4.90 11.19 14.10 15.33 8.45 10.09 9.97 > 4 14.82 9.80 11.93 18.35 21.82 17.09 10.53 7.51 > 8 16.13 9.91 15.26 11.85 26.53 13.09 12.71 13.75 > 16 15.99 8.81 13.84 22.43 33.89 11.91 12.30 13.26 > 32 14.03 11.37 17.54 23.96 57.07 18.78 19.51 21.29 > 64 15.79 9.55 22.19 33.17 57.18 65.51 55.39 62.53 > 128 18.22 16.65 21.49 30.73 52.99 61.05 58.44 60.38 > 256 19.78 20.56 24.72 34.94 56.73 71.11 61.83 62.77 > 512 20.27 21.40 27.47 39.23 65.72 67.97 70.48 71.39 > 1024 20.48 21.48 27.48 38.30 68.62 77.94 78.00 78.95 > > > >> >>> Existing page migration performs sequential page copying, underutilizing >>> modern CPU architectures and high-bandwidth memory subsystems. >>> >>> We have proposed and posted RFCs to enhance page migration through three >>> key techniques: >>> 1. Batching migration operations for bulk copying data [1] >>> 2. Multi-threaded folio copying [2] >>> 3. DMA offloading to hardware accelerators [1] >>> >> >> Curious: does memory migration of pages that are actively undergoing DMA >> with hardware assist fit into any of these? > > It should be similar to 3, but in this case, DMA is used to copy pages > between NUMA nodes, whereas traditional DMA page migration is used to copy > pages between host and devices. > I'm planning to test using SDXi as the DMA engine for offload and it doesn't support migrating pages that are actively undergoing DMA AFAIU. >> >>> By employing batching and multi-threaded folio copying, we are able to >>> achieve significant improvements in page migration throughput for large >>> pages. >>> >>> Discussion points: >>> 1. Performance: >>> a. Policy decision for DMA and CPU selection >>> b. Platform-specific scheduling of folio-copy worker threads for better >>> bandwidth utilization >> >> Why platform specific? I *assume* this means a generic framework that can >> optimize for scheduling based on the underlying hardware and not specific >> implementations that can only be used on AMD, for example. Is that the >> case? > > I think the framework will be generic but the CPU scheduling (which core > to choose for page copying) will be different from vendor to vendor. > > Due to existing CPU structure, like chiplet design, a single CPU scheduling > algorithm does not fit for CPUs from different vendors. For example, on > NVIDIA Grace, you can use any CPUs to copy pages and always achieve high > page copy throughput, but on AMD CPUs with multiple CCDs, spreading copy > threads across different CCDs can achieve much higher page copy throughput > than putting all threads in a single CCD. I assume Intel CPUs with chiplet > design would see the same result. Thank you Zi for helping with results and queries. > >> >>> c. Using Non-temporal instructions for CPU-based memcpy >>> d. Upscaling/downscaling worker threads based on migration size, CPU >>> availability (system load), bandwidth saturation, etc. >>> 2. Interface requirements with DMA hardware: >>> a. Standardizing APIs for DMA drivers and support for different DMA >>> drivers >>> b. Enhancing DMA drivers for bulk copying (e.g., SDXi Engine) >>> 3. Resources Accounting: >>> a. CPU cgroups accounting and fairness [3] >>> b. Who bears migration cost? - (Migration cost attribution) >>> >>> References: >>> [1] https://lore.kernel.org/all/20240614221525.19170-1-shivankg@amd.com >>> [2] https://lore.kernel.org/all/20250103172419.4148674-1-ziy@nvidia.com >>> [3] https://lore.kernel.org/all/CAHbLzkpoKP0fVZP5b10wdzAMDLWysDy7oH0qaUssiUXj80R6bw@mail.gmail.com >>> > > [1] https://lore.kernel.org/all/20250103172419.4148674-1-ziy@nvidia.com/ > -- > Best Regards, > Yan, Zi >