From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 26654E77188 for ; Thu, 16 Jan 2025 04:58:09 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A6DA9280001; Wed, 15 Jan 2025 23:58:08 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id A1DB76B0085; Wed, 15 Jan 2025 23:58:08 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 848CF280001; Wed, 15 Jan 2025 23:58:08 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 600686B0082 for ; Wed, 15 Jan 2025 23:58:08 -0500 (EST) Received: from smtpin22.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id DC73F16066A for ; Thu, 16 Jan 2025 04:58:07 +0000 (UTC) X-FDA: 83012108214.22.0FEAF9B Received: from NAM12-MW2-obe.outbound.protection.outlook.com (mail-mw2nam12on2073.outbound.protection.outlook.com [40.107.244.73]) by imf25.hostedemail.com (Postfix) with ESMTP id BDCBCA000F for ; Thu, 16 Jan 2025 04:58:04 +0000 (UTC) Authentication-Results: imf25.hostedemail.com; dkim=pass header.d=amd.com header.s=selector1 header.b="HOq/IB4P"; spf=pass (imf25.hostedemail.com: domain of shivankg@amd.com designates 40.107.244.73 as permitted sender) smtp.mailfrom=shivankg@amd.com; dmarc=pass (policy=quarantine) header.from=amd.com; arc=pass ("microsoft.com:s=arcselector10001:i=1") ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1737003485; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=mP6RJkRsUpOzUjnjCFH3pWTRx257pR0W6a4GjcTM3Pk=; b=f6EWagDlMe7iPMKv9uf53SbJc/IXFRYQKX/gK93FzwhLuPgKPsQlmDnzJbo1OvmYodTYVO Xe+8edObXJP3FBFeQbRCdKVp+BP6la7yMeVbiM6IV9idOYxJACTmkhE/weTXx7fVIJ3NNJ w/BJ9VxLCyNciCQVNKTUW9y8LQRc5Os= ARC-Authentication-Results: i=2; imf25.hostedemail.com; dkim=pass header.d=amd.com header.s=selector1 header.b="HOq/IB4P"; spf=pass (imf25.hostedemail.com: domain of shivankg@amd.com designates 40.107.244.73 as permitted sender) smtp.mailfrom=shivankg@amd.com; dmarc=pass (policy=quarantine) header.from=amd.com; arc=pass ("microsoft.com:s=arcselector10001:i=1") ARC-Seal: i=2; s=arc-20220608; d=hostedemail.com; t=1737003485; a=rsa-sha256; cv=pass; b=RLIr6bwahl20i3zOZjQGgXqfAbiWXdAePddnB5AQzaCFSZdC42wSkslsrfT/WAcWT7lkCj 8ogAog1Jf7KRACJLJ9+6VXnc1rMiEk9Sjzb+T9f0Fu2dy0Tks+ahawd8B9NqAsGszNvaiR mBPSkILEWagqnQ2kdxrZsiT/FvvXw/E= ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=U2xafWwftXj3gdZ5/M8Fh+dNUxpkJOnRZuxrlx6aLawZP5UHMV9mQbchzFkB70/g4TDN68CS8EEE3qVEPRUMtq5C2ST5Rn0n2P5v18FSrzZkEGfiWYbAwexHV2+yxUleNX6vprnPOQeiqX94KhudeBzciVb2mFZque0WzNBfIcXDwF0VeY1oHr9aAHqzb0xguCGtB9kWJeXhuI8ohS2J05fp8UbV9Nk8hHmZVCljM55h9fbF6atmzBw1LJ6arLCknelAcaCoS+lHTuBdGkLrU0IF72GDyFKztPP94UCdcwTNT7cvcIJ1ePtD18qyBzA1sP6tipVKLOt5ImpQUOpRtA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=mP6RJkRsUpOzUjnjCFH3pWTRx257pR0W6a4GjcTM3Pk=; b=y3FQ0jrTYdaiwpLuGHzsUrsf15RY5tuhw+zhzWw9yT6mHOHvxGTKX7PH8tzHsYLOM/7e5mK6R0sFy5aIdebtc03/SX9CPJ/YntvkCDFaFX/qSKUVdU7f9SJfwIqfx8qcZOu+qOiLPP1h2uD7HQbVsP6oIJuZ1+pcqloOdFfUGVOEH1nFM2GgjKUJsbwDHGLUCZIEMrAYFF3kee+QyaoyI8CzF7quPwGy19bG9vUQ8Pb8xFuVaUvKrQiXxHyX4CEawtSPUm8uyYYjiwg88LF5GqM3kHZxcbwT/XgFdCCeWCDovaBn9ICKEYCTCOz3CChgH7uFXnkAPksx778Y6nBoSg== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=amd.com; dmarc=pass action=none header.from=amd.com; dkim=pass header.d=amd.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amd.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=mP6RJkRsUpOzUjnjCFH3pWTRx257pR0W6a4GjcTM3Pk=; b=HOq/IB4PaztvEqDHLoNwuDswg3cwAeakKGHNAGZEDFIoaiSs/JPyqAbJ1Pm/xUCP6usb3YHDm3BeyA42TvMau4vkYrCDIwqWk7GcqXGQF3bDi+jRbbAcxmTiJ6vpoB5/l31g7wlulROQjsFM3yG5ZvGIF+gZbyqJ6iNgzNcpUYc= Received: from CH2PR12MB4262.namprd12.prod.outlook.com (2603:10b6:610:af::8) by LV3PR12MB9401.namprd12.prod.outlook.com (2603:10b6:408:21c::22) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.8356.13; Thu, 16 Jan 2025 04:58:01 +0000 Received: from CH2PR12MB4262.namprd12.prod.outlook.com ([fe80::3bdb:bf3d:8bde:7870]) by CH2PR12MB4262.namprd12.prod.outlook.com ([fe80::3bdb:bf3d:8bde:7870%3]) with mapi id 15.20.8356.010; Thu, 16 Jan 2025 04:58:00 +0000 Message-ID: <3212f4d5-afdb-47fe-a2ea-ad61c69836af@amd.com> Date: Thu, 16 Jan 2025 10:27:49 +0530 User-Agent: Mozilla Thunderbird From: Shivank Garg Subject: Re: [RFC PATCH 0/5] Accelerate page migration with batching and multi threads To: Zi Yan Cc: linux-mm@kvack.org, David Rientjes , Aneesh Kumar , David Hildenbrand , John Hubbard , Kirill Shutemov , Matthew Wilcox , Mel Gorman , "Rao, Bharata Bhasker" , Rik van Riel , RaghavendraKT , Wei Xu , Suyeon Lee , Lei Chen , "Shukla, Santosh" , "Grimm, Jon" , sj@kernel.org, shy828301@gmail.com, Liam Howlett , Gregory Price , "Huang, Ying" References: <20250103172419.4148674-1-ziy@nvidia.com> <600a57ff-a462-4997-a621-f919c2c4fa84@amd.com> <567FDE63-E84E-4B1E-85F4-4E1EB0C2CD26@nvidia.com> <003b0818-a35e-429c-9408-5e7344e981f2@amd.com> <8E1D6790-8A44-48C2-9FA5-66C7AB6CE531@nvidia.com> <334B7551-7834-44E7-91E6-4AE4C0B382AF@nvidia.com> X-Mozilla-News-Host: news://nntp.lore.kernel.org Content-Language: en-US In-Reply-To: <334B7551-7834-44E7-91E6-4AE4C0B382AF@nvidia.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-ClientProxiedBy: PN3PR01CA0033.INDPRD01.PROD.OUTLOOK.COM (2603:1096:c01:97::8) To CH2PR12MB4262.namprd12.prod.outlook.com (2603:10b6:610:af::8) MIME-Version: 1.0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: CH2PR12MB4262:EE_|LV3PR12MB9401:EE_ X-MS-Office365-Filtering-Correlation-Id: c0f78738-bd03-4bc2-a9ec-08dd35ea598a X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0;ARA:13230040|1800799024|7416014|376014|366016; X-Microsoft-Antispam-Message-Info: =?utf-8?B?RWNUWHpoY3F5b1R0Ny9CbStLa1lyQlRRZGd0aHNaWWIzSWtjSnU5aTBoOGoz?= =?utf-8?B?ak4zeG1IWkZLTms2M2Q1UTMrZWpQSFV1ellSS1J1aWJsVG15S2FNWFZwMVda?= =?utf-8?B?Q096YzFYQzVsNHZ1emZZNTB1Mjh5RzNqK3ZIRm5lRXNlN3RIODdpeTRpeVJO?= =?utf-8?B?eWNONWlqcVFrcnVNSnpWRGxtN1A3NVZMYlJBVW1qZ1lKeUU5UGF3NFZlS3Zq?= =?utf-8?B?T2JxNXd5S1RoRjI4dVB3OFZUMDE0WHY4YWQwT0pTVlNFMFV5eFYrUXRzbDNn?= =?utf-8?B?bEM3WmdTL0MwYXFUcWgyY0d2dS8rcmx0K2VPOVB3S284blkybFFsejBBYTBq?= =?utf-8?B?NVcvRkVQdzIrUHV1RWpaOUhITC85alBnYzFTMkEyRUxmVFNOekdva0pzVWlL?= =?utf-8?B?R2lZdC9rcW1NWTkyRzkybjMwa09uZWJVLzRDYmpQOGY4WWtibi9rdGJBKzV5?= =?utf-8?B?ZFlBUFZlTE1UczluckxRNUx1RzRieW5Ud0hXRHNuSVBlMWtKVzZlVG9mMzhz?= =?utf-8?B?ZUk5eWVCVlc3a3c3NldTRkVoTlhtTHhiZXR2UHMwSHEyT2RicFpNTDl1enRZ?= =?utf-8?B?WlJDSmk5SDcwMHlQcjNxbTZtVW52RmcrVXc2MDlQT1JrREN1QU9OTGNVd1oz?= =?utf-8?B?bGQybTdYcHBvaHREQ01ZQTMzT1ZqeldXVUxWQnIvc2N3YTNDTk1yTGkyMEJR?= =?utf-8?B?RTBmb3piZWorY3M1QnlxZW10YlBPNkZqT3REcUovZmRwcUVqZjZNYVpIM0hY?= =?utf-8?B?MWd2ME15WU9yd09aZGhUYTYvb2xtQ0k5cEJXWG1YUzA4TnA5eGZ3UEFKNDM0?= =?utf-8?B?Z3pxak1rUEFlandEa1BjeHhPRDB0Nzkyem8yMjhpZXoyZ3BveUhBR2ZzcGFC?= =?utf-8?B?L3JUVGM0a09IN21BVkUrY2dZYUlwc1hidEZJRnYxb0ljVTdaNDBOVlJ0K1Vs?= =?utf-8?B?L1Bab2NCUkZYbjJjeks1U2xxNmUxNGxEVFZtUzJPZDVKN2kxYVJqdjVuMXlq?= =?utf-8?B?dnZDRXZKbW1WblVkaUxkT3JUS0o5RWU1V3VMTGpkakVjc3hHcXJRTG1EbHMv?= =?utf-8?B?Mjg0MGNVcHJnQnFmVGkwOFRJM293aUQ5eEdVUklmb2NwckcvQVlteHpDL2Nl?= =?utf-8?B?OUR6Tkd0RnV6RVJaaUZkUis0T3FRRHhDYzg0ZlYyT2FkRFJaZTUvN1duZ1Bn?= =?utf-8?B?a0dody9pc1NnRWJTOW5tTml0L3JPcHYwZ1YvekNTWnZ1Zis3U1FyUVZ2cnVH?= =?utf-8?B?S3MrSGNlbURHVmZBVk80U2dVQlV0emJ0TjlhUVBBL0JxNE0zL0Jib05yNUh3?= =?utf-8?B?TUdieUVhMGdHOGNTQkJQM1lyTk9xTkJNM3ptNDNxejhTaWFXY2lRcUNLUDBj?= =?utf-8?B?Y0VMS0NGRkZ0UUdiK1BhMndTeGpHYlNoNTdWaGJIQnh4VWU5eE9TZHBGdWtj?= =?utf-8?B?bmZKTkRLMWllZ2FYZEIzT1NvZHhOdGw1dUttQTNHVFk0VlE4RVhqTmNBdU1H?= =?utf-8?B?ck4yTkl6WlREQ2xoZVlWZHQwUGN3Y28xRTZjMVFuTTFFbVFHalhXYU5jNjhw?= =?utf-8?B?UVhRREM5MEhJMytraXdaSkxBQ0tqOWJCem80RHB4aXJ5Zmg3M09mSUVlZGNR?= =?utf-8?B?bGxGUGx0QXl1S0YwbDVyZGZuSUhmU3IxMkVKL0dycUFPOW1MNHpsTm51R29x?= =?utf-8?B?TDdNOUtSL09lUThlbTZyYzRHUldSKzRpdWZ3cCtuZ0lVU2FTTlhqeG5CQmFO?= =?utf-8?B?Z0w1emhhN3JINVN2SVJxT2orTVJIU29BazRENGFBUElBZGhOaWswQ1RLQU9P?= =?utf-8?B?enNER05SK3ZQdUd2S0FEdVQ1czZmb3VwMThXYVBCdXdOZXMvSjVPNndsQU9Z?= =?utf-8?Q?0JTIffNoI9adt?= X-Forefront-Antispam-Report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:CH2PR12MB4262.namprd12.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230040)(1800799024)(7416014)(376014)(366016);DIR:OUT;SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: =?utf-8?B?SFNNdnJPb2lmcTdJVC9VVDdldTRORDJFWWhEQVM5SHhIbXRpcm1Jd1hDUHVr?= =?utf-8?B?RlRvZ3JYUVZpOTBCbWp5UjNJSEg4T1ZzcFVyb1ArdWMwd0srSHZpSW5wZ3Yz?= =?utf-8?B?VjhQcmZkdmg5L2dGdEFrMXNvUHVjUGJJKzIwQlBYTk5aUjZvN3JHUU5JQjdw?= =?utf-8?B?enpLV0tsd05sZUVuaTZERE5yWVlGZis3TGxENzJWRDJHWlFEU0xUSklvSFBY?= =?utf-8?B?MHcvWDA0NkhTVVdGWkoveFJ0WXdTVURkSWNQU2M1akhWNERQcERod1Q2WHR0?= =?utf-8?B?bkhES2ZXcnN6MFVlNzFTYXpCTzJLZFhGZzlrTUJzdUxva0lHT1A4bzY2SU5w?= =?utf-8?B?QWFneXVRd2I5NWZ0VFlBeHA5TC96RzIrY1o3dTRsQ2gwQUl5RzluUmg4TUFP?= =?utf-8?B?ZzJmc09Cb1NaNFluWFlPcVdYcHJaZ0xCd0orTDJOb1Y3alR3V3J2WkY3VGJ2?= =?utf-8?B?ZDlZZmVxTGxzcGFLWnJpYURSVzJLQjh1TlhjS05EcWh6MmRWek9vZEh3ckhW?= =?utf-8?B?d3pBY1lIcnVySFFZK0FqTUs3ZlZucUp6NWRScEIwOEp6VnB6ZCswZEI1ZW9i?= =?utf-8?B?NEp5S041KzZEOHhLdE9xWHJqNE9IamFZMGpCZmtFQUFNNFZZVUhGRzdod2Zk?= =?utf-8?B?bko4RFkraGdzTk9lNXlaWmx3cEpNN0hqK3RnY2kzZHhRZnJWVXkzaXUyNTM1?= =?utf-8?B?SGMzdkZ5TmlzeHRmY0xUK3lKdWh4aXh6SnQ1WTdZRUt2eFZtVkc4Q0IxTmNr?= =?utf-8?B?S2RIUllPc3lBMGlRSXUvcE5ienFxU25CckVBZzA3WUxRdlNEOStvRG53di9i?= =?utf-8?B?LzlISVdWQ2ZaQ1d2UFo5UVRkWDJCT2JxU0VTTGM2RUpwUHFFbU1qL253enJE?= =?utf-8?B?cklTRmpWNlpkQXZ1MHViWW9IY3R1UkkxM1Z2VHNSY2JhdjRhSlZQZVdEZXdi?= =?utf-8?B?V0hjcnAzOWMzYTB1Q1pkVUlkN0NHVUY5NmVkMmFJNlB5VDU2OE83TEF4ajBX?= =?utf-8?B?eVl0ODFmNlgzblBNSWRMOVlFU0lwRURoZ2YzS1Y2MC9DWGx0MjhCd3pSaktG?= =?utf-8?B?aUo1cmhRM3ZSemx4eml5UFpXVWx6RWwwWUdDN1BEKyt4cm1kcXI5dHhLbjA4?= =?utf-8?B?N2Z0bzBtaE84VGtuME9lYlAzODFkSUwySkZPRkpkQjNibDNDanVvb0JwbUVP?= =?utf-8?B?RHA0WXI3aEt5aW1BdGtUcVZtQk95NGV0Z1g2QmNZZnhnQWtEZjdHdXVWdGNB?= =?utf-8?B?ZDVERTJ0akMxOGlaQkFjckJBQ2dBNk1lSTdiUlBjb3lIZFV1R3htSGNOUDdw?= =?utf-8?B?OC9tUzFYTTgyZFQyWXIrSXdLb0ttSU4rUjNaZmQ1MTlKNnBxK3grQmhiTFU4?= =?utf-8?B?YTNnRW1DTUVRZWFZOG9xM05hSjZFbEVJOUpPeWdsSllwTUdHWGdkYjNhTDlz?= =?utf-8?B?c2tFVS9zQmZqUGl6ZDhHQWVrTjhqNlBrVEV6L0ZxWW1FOExLWTBzY2IzYWFM?= =?utf-8?B?SHF3RmZ4MXNLcExvTFhseG9XNFZDRktiNHZlNjVRb2c1aUFLWFJ6dDREbnZo?= =?utf-8?B?N1lWTHQ5Q3dmYW1ZaFVwaXBNNGxxNVlWbnFycHIrN3ZNNHBQZ2JuTFAvQitP?= =?utf-8?B?UDNTL0J4Tm1Kd2poR2x6Q3R0NENiTFErU2xRQ2k5QjJoUk0vK2l0ajhQT1Fl?= =?utf-8?B?QTlUUFpWYlU4RlZiKzJvb0hUK3JtaVl5eUF0VUtPMCtRZFAyZzMxNURaeElM?= =?utf-8?B?MnhmZW9mRXExby9QMXNEVEtqTHBMTWE4eDg2TlAzVVU5K0R6SkkwbHZRYkFR?= =?utf-8?B?K3gwZHFaY3AzV3NYTzBCSy9rOG9kdHFiR2MwSThhWnZIWm9TMlBUZ0x1bUNW?= =?utf-8?B?a1dMaVloOWxSbGJPNHNTUHlqNkRwZ3JxOUJEdVlUWjhYaXZsYWZwMjExUjh0?= =?utf-8?B?L3lIWHJOVUJLeUVrNW5BTTFqdjhrNTBmVVJoM3hiMm9LRTE4R0wyNWNORFM4?= =?utf-8?B?RzJrZlg4cnRaa1JEV0NRSnVJdE1YYmNxdTErRHMxRG9TblREOHVGOEVaMVBU?= =?utf-8?B?c20vRGxkRy9UTitlS0JSZVVwQ0h2K29scEt5VXZwZWJtb3lwcFlNTVhjL0E2?= =?utf-8?Q?18KQLmJBjZcfieaL4iArYLNw+?= X-OriginatorOrg: amd.com X-MS-Exchange-CrossTenant-Network-Message-Id: c0f78738-bd03-4bc2-a9ec-08dd35ea598a X-MS-Exchange-CrossTenant-AuthSource: CH2PR12MB4262.namprd12.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 16 Jan 2025 04:58:00.6442 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 3dd8961f-e488-4e60-8e11-a82d994e183d X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: IwW+z3xCVbqPIBnhDBvbS+KMHt/DV4yLUmmglXYwcAfMnOzNydZK7rGENCWdgR9m8bV7PRheo//sZQH5SfgjAw== X-MS-Exchange-Transport-CrossTenantHeadersStamped: LV3PR12MB9401 X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: BDCBCA000F X-Stat-Signature: ydd5bx4ooxagwe4cob1bbsdsg76mumzg X-Rspam-User: X-HE-Tag: 1737003484-646914 X-HE-Meta: U2FsdGVkX1+TlrsKERW4TyL9+uKgOUcEB88ZIGUIAhEsMVAExQRD5VBt7z5aoa4u1NFqO9ne6lv1uFXc+ujMAxZKDMu/WDAFVGGctaOCxilofuOgBGhzqPuRrYTxtfZK7VpiYbUasBwFKAESoUejvuykBDOsLfiFA2qWXyDnoLQ0mfIXMJQauZKIjNMrPXrVOE/5lVxu/A+BuKu+CYEPcX7sAQ6wZ8vDQ5xzYpJiUs55OU/QPzMfTSBlYQxxVmLPfwjQ6iJgLzHLkELbQYJl1uIeL5OaVOosadqvFIQEBPg4T7k9K/EZCb+gkChDLR2FEJk9UyQdJTkl8Iqxxz+OMq0sIjDKcVJrRAtoVEErApXHNWPXdjzq1i6s/v33jMhoAtlUljzxpeykbkulhgJkcIHDc/eAvOgQLcxUckYHuHnR+FQUxQzt7vXdVEFW+hB2Oxo1QaQVtsD9ahAASPkAi0Yoec+S3iLzl13zUIaU1oqC1SymM6aClP9W1at0OQZgK4IY78SsThLzRa0eXBt9zVsGT+UZ3UeCjkKOVdczNDooLquuTw7Ry0kKi0jLZWX10zGO0i/3zzlQUxv5tQry7URyK1ZbAodGV8hMYoWAjTN09VmzneF9bLROWtWzsS8dkGQlXW2I2U0NZnu2h3xi2NZwHuqJKnt/bqUkHTMMr0oOyZwg4Z++6zuV+PtdKZQE6iXoU+cvjaGNemmO9bJBrCBEPkkCBmnnnVfrX9XjCEj/IRpTakY/ENxdR7GjSNNa7K9e+OfYUakVUI+hUouaW2v20a2wyhqWzwxY6S6yaGNNL1BiAv0+/Shb7bc8rutufLWFGOJroK1hito5CjtC8xD1oOgs4FPYFf69yEe0Bt06NLB9F5uacEm/ViUuCtW3XhPrv4wpvutjLFsJ3ba7fqWoFMWd7ZNDg6utyjQcD6baFJW3NfXGOTaUPRHAaw8Tqlc4vovo6wMBLC56q/q 8wI0Frek w8fCTzqTMCsiAnPTx3WVZc/qxzR+xSuPlBYO/isd60zUUssii5JbG0PXao7gQEnQTXyPdNV+bqb0qXz6XzesYinFGUkt2Qa5pAE9RdVn6qbPROmWi3zOBYjPXrMMQGLAX5elyKt8cgwu9yHewQ3RVYODany+8qNTU4xCfC/mV9bBp0qDCQpq3xFVXvRozEsSrUoG7IMbESs4AB0WW4DTDFwe2ffP5y84YQ2315/d56F7XqCpezynqNu7ETtGpBzqAWJRa2cKRrwuc47ASLN4uG/FZba2NO3b60FFMZa7ikYqSU43vQqtl461WWI3tltNJqvl4cmFZBkcmh9sPPzMoVAepWD3Ec96D2sEBCqcsA33BIeYv5kMBO977OJBeJeJRqy34LfWBH8MkiMzwSUW1moXEis/Awfeob2ME9I2Qw5x8h+/g+96xLyboS5+11LZi8mL1U+IRhTMkdcSuSLYVtTXJXEFeUWoSfbLY X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 1/11/2025 1:21 AM, Zi Yan wrote: >>> BTW, I notice that you called dmaengine_get_dma_device() in folios_copy_dma(), >>> which would incur a huge overhead, based on my past experience using DMA engine >>> for page copy. I know it is needed to make sure DMA is still present, but >>> its cost needs to be minimized to make DMA folio copy usable. Otherwise, >>> the 768MB/s DMA copy throughput from your cover letter cannot convince people >>> to use it for page migration, since single CPU can achieve more than that, >>> as you showed in the table below. Thank you for pointing this. I'm learning about DMAEngine and will look more into DMA driver part. >>>> Using init_on_alloc=0 gave significant performance gain over the last experiment >>>> but I'm still missing the performance scaling you observed. >>> >>> It might be the difference between x86 and ARM64, but I am not 100% sure. >>> Based on your data below, 2 or 4 threads seem to the sweep spot for >>> the multi-threaded method on AMD CPUs. BTW, what is the bandwidth between >>> two sockets in your system? From Figure 10 in [1], I see the InfiniteBand >>> between two AMD EPYC 7601 @ 2.2GHz was measured at ~12GB/s unidirectional, >>> ~25GB/s bidirectional. I wonder if your results below are cross-socket >>> link bandwidth limited. I tested the cross-socket bandwidth on my EPYC Zen 5 system and easily getting >10X bandwidth as this. I don't think BW is a issue here. >>> >>> From my results, NVIDIA Grace CPU can achieve high copy throughput >>> with more threads between two sockets, maybe part of the reason is that >>> its cross-socket link theoretical bandwidth is 900GB/s bidirectional. >> >> I talked to my colleague about this and he mentioned about CCD architecture >> on AMD CPUs. IIUC, one or two cores from one CCD can already saturate >> the CCD’s outgoing bandwidth and all CPUs are enumerated from one CCD to >> another. This means my naive scheduling algorithm, which use CPUs from >> 0 to N threads, uses all cores from one CDD first, then move to another >> CCD. It is not able to saturate the cross-socket bandwidth. Does it make >> sense to you? >> >> If yes, can you please change the my cpu selection code in mm/copy_pages.c: This is making sense. I first tried distributing work threads across different CCDs, which yielded better results. Also, I switched my system to NPS-2 config (2 Nodes per socket). This was done to eliminate cross-socket connections and variables by focusing on intra-socket page migrations. Cross-Socket (Node 0 -> Node 2) THP Always (2 MB pages) nr_pages vanilla mt:0 mt:1 mt:2 mt:4 mt:8 mt:16 mt:32 262144 12.37 12.52 15.72 24.94 30.40 33.23 34.68 29.67 524288 12.44 12.19 15.70 24.96 32.72 33.40 35.40 29.18 Intra-Socket (Node 0 -> Node 1) nr_pages vanilla mt:0 mt:1 mt:2 mt:4 mt:8 mt:16 mt:32 262144 12.37 17.10 18.65 26.05 35.56 37.80 33.73 29.29 524288 12.44 16.73 18.87 24.34 35.63 37.49 33.79 29.76 I have temporarily hardcoded the CPU assignments and will work on improving the CPU selection code. >> >> + /* TODO: need a better cpu selection method */ >> + for_each_cpu(cpu, per_node_cpumask) { >> + if (i >= total_mt_num) >> + break; >> + cpu_id_list[i] = cpu; >> + ++i; >> + } >> >> to select CPUs from as many CCDs as possible and rerun the tests. >> That might boost the page migration throughput on AMD CPUs more. >> >> Thanks. >> >>>> >>>> THP Never >>>> nr_pages vanilla mt:0 mt:1 mt:2 mt:4 mt:8 mt:16 mt:32 >>>> 512 1.40 1.43 2.79 3.48 3.63 3.73 3.63 3.57 >>>> 4096 2.54 3.32 3.18 4.65 4.83 5.11 5.39 5.78 >>>> 8192 3.35 4.40 4.39 4.71 3.63 5.04 5.33 6.00 >>>> 16348 3.76 4.50 4.44 5.33 5.41 5.41 6.47 6.41 >>>> >>>> THP Always >>>> nr_pages vanilla mt:0 mt:1 mt:2 mt:4 mt:8 mt:16 mt:32 >>>> 512 5.21 5.47 5.77 6.92 3.71 2.75 7.54 7.44 >>>> 1024 6.10 7.65 8.12 8.41 8.87 8.55 9.13 11.36 >>>> 2048 6.39 6.66 9.58 8.92 10.75 12.99 13.33 12.23 >>>> 4096 7.33 10.85 8.22 13.57 11.43 10.93 12.53 16.86 >>>> 8192 7.26 7.46 8.88 11.82 10.55 10.94 13.27 14.11 >>>> 16348 9.07 8.53 11.82 14.89 12.97 13.22 16.14 18.10 >>>> 32768 10.45 10.55 11.79 19.19 16.85 17.56 20.58 26.57 >>>> 65536 11.00 11.12 13.25 18.27 16.18 16.11 19.61 27.73 >>>> 262144 12.37 12.40 15.65 20.00 19.25 19.38 22.60 31.95 >>>> 524288 12.44 12.33 15.66 19.78 19.06 18.96 23.31 32.29 >>> >>> [1] https://www.dell.com/support/kbdoc/en-us/000143393/amd-epyc-stream-hpl-infiniband-and-wrf-performance-study > > > BTW, I rerun the experiments on a two socket Xeon E5-2650 v4 @ 2.20GHz system with pull method. > The 4KB is not very impressive, at most 60% more throughput, but 2MB can get ~6.5x of > vanilla kernel throughput using 8 or 16 threads. > > > 4KB (GB/s) > > | ---- | ------- | ---- | ---- | ---- | ---- | ----- | > | | vanilla | mt_1 | mt_2 | mt_4 | mt_8 | mt_16 | > | ---- | ------- | ---- | ---- | ---- | ---- | ----- | > | 512 | 1.12 | 1.19 | 1.20 | 1.26 | 1.27 | 1.35 | > | 768 | 1.29 | 1.14 | 1.28 | 1.40 | 1.39 | 1.46 | > | 1024 | 1.19 | 1.25 | 1.34 | 1.51 | 1.52 | 1.53 | > | 2048 | 1.14 | 1.12 | 1.44 | 1.61 | 1.73 | 1.71 | > | 4096 | 1.09 | 1.14 | 1.46 | 1.64 | 1.81 | 1.78 | > > > > 2MB (GB/s) > | ---- | ------- | ---- | ---- | ----- | ----- | ----- | > | | vanilla | mt_1 | mt_2 | mt_4 | mt_8 | mt_16 | > | ---- | ------- | ---- | ---- | ----- | ----- | ----- | > | 1 | 2.03 | 2.21 | 2.69 | 2.93 | 3.17 | 3.14 | > | 2 | 2.28 | 2.13 | 3.54 | 4.50 | 4.72 | 4.72 | > | 4 | 2.92 | 2.93 | 4.44 | 6.50 | 7.24 | 7.06 | > | 8 | 2.29 | 2.37 | 3.21 | 6.86 | 8.83 | 8.44 | > | 16 | 2.10 | 2.09 | 4.57 | 8.06 | 8.32 | 9.70 | > | 32 | 2.22 | 2.21 | 4.43 | 8.96 | 9.37 | 11.54 | > | 64 | 2.35 | 2.35 | 3.15 | 7.77 | 10.77 | 13.61 | > | 128 | 2.48 | 2.53 | 5.12 | 8.18 | 11.01 | 15.62 | > | 256 | 2.55 | 2.53 | 5.44 | 8.25 | 12.73 | 16.49 | > | 512 | 2.61 | 2.52 | 5.73 | 11.26 | 17.18 | 16.97 | > | 768 | 2.55 | 2.53 | 5.90 | 11.41 | 14.86 | 17.15 | > | 1024 | 2.56 | 2.52 | 5.99 | 11.46 | 16.77 | 17.25 | > I see, thank you for checking. Meanwhile, I'll continue to explore for performance optimization avenues. Best Regards, Shivank > > > Best Regards, > Yan, Zi >