From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id DB1CDE77188 for ; Fri, 10 Jan 2025 19:51:11 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 73B106B00C0; Fri, 10 Jan 2025 14:51:11 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 6C35A6B00C1; Fri, 10 Jan 2025 14:51:11 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 49F996B00C2; Fri, 10 Jan 2025 14:51:11 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 1E8166B00C0 for ; Fri, 10 Jan 2025 14:51:11 -0500 (EST) Received: from smtpin03.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id BFABDC0C8E for ; Fri, 10 Jan 2025 19:51:10 +0000 (UTC) X-FDA: 82992585900.03.7C3CFE4 Received: from NAM10-DM6-obe.outbound.protection.outlook.com (mail-dm6nam10on2067.outbound.protection.outlook.com [40.107.93.67]) by imf28.hostedemail.com (Postfix) with ESMTP id C840EC0013 for ; Fri, 10 Jan 2025 19:51:07 +0000 (UTC) Authentication-Results: imf28.hostedemail.com; dkim=pass header.d=Nvidia.com header.s=selector2 header.b=XMcfARU7; spf=pass (imf28.hostedemail.com: domain of ziy@nvidia.com designates 40.107.93.67 as permitted sender) smtp.mailfrom=ziy@nvidia.com; arc=pass ("microsoft.com:s=arcselector10001:i=1"); dmarc=pass (policy=reject) header.from=nvidia.com ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1736538668; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=wbFKAIJspVh1qMcoK8UQtXMzBLzXGAqUVEkeZCSfPtc=; b=o4STzMmeCQMvz34YqhtCzUvMDOP8uoPY6Y3GhLdh6FkFcwHriKSHkopcJygiIsSTEMNaxZ jsAJOx4LMgG5hYLMIwVaCTVxn7ZGKUtTzSFT+39KyHtvPbSesyCI7+WL9drcWB6rcOoCeR Z6tkP0nP0DShwf5C/QZzp3O8VEIw3hw= ARC-Authentication-Results: i=2; imf28.hostedemail.com; dkim=pass header.d=Nvidia.com header.s=selector2 header.b=XMcfARU7; spf=pass (imf28.hostedemail.com: domain of ziy@nvidia.com designates 40.107.93.67 as permitted sender) smtp.mailfrom=ziy@nvidia.com; arc=pass ("microsoft.com:s=arcselector10001:i=1"); dmarc=pass (policy=reject) header.from=nvidia.com ARC-Seal: i=2; s=arc-20220608; d=hostedemail.com; t=1736538668; a=rsa-sha256; cv=pass; b=f2U1gCBEfcpck9eKXnAwU+ZYSlIwfW7gbAl1rOX/83hTZ6TqPDFqr6LY6Si+l3Lw87lHYf /80PQ+j05Bd5wMOaowJG6WdthMy0WMmGQ+92DOO4fQwGbyfbmh73bd/Ts1cU/Wld+antBI /N9Jh5mr58R0dWvrlJRK85nwr08WBQ0= ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=yaWrfPxzdCbkrDEzQPZ8gaDN+LYyUFd7ONjR8ZWFo99tTghyGBo347Ibv8g9m+7V9vCUq8NcqP2f4eDkrQht4f2m42gspz5OVdPoajMpTlOG5aYYb6g1mRjMJllyCsd10vG9jTHD5uU0KTn11JepeI9kstRaBTObqsHM8v2nnpiKSZTq38uXhu/4oYC8RjzkT/UuxmcmNy6tF4CPatYKIvZZVnK/+SjX/rsz60n3m79xORcHjl8aaYZRo23NpcZ+r0Wuve2T1ASeJk0MyYkGbceNTUPZQDH2QUbdIMYrXQVHdPg/ar+dxebLbf9gKssTT8j4TlEIoRf6dJahHH2riw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=wbFKAIJspVh1qMcoK8UQtXMzBLzXGAqUVEkeZCSfPtc=; b=hVWTOfRC8apNoxGj6xRG9vZ3+6arL+r2DSkcUMfvojVOTKjIIsd5/kNNZp0q2LvZkj3aDF5dO+r4joAUuQevRfg74PtuR0o3otHrABqGeuhErsIyOZlVzvTJbFVRniPODB9OrFHqDuu5Ts2Le3dVpo89tdEBS5In03MAunvbLcMYUeNwUoqjmBRrixaOv/fY/1woQc51Vx8flfqbZc3j0oO3+uhfx/sL15wVWw1w+tEflpPfkegj0UmRhAJkRFusLYUejlzTHxzgC5fCyWL2e0hcWPMhJVsMCAjpVoQwWDeEkFh4u/CJdqfxDgOKubMKbfZW9DXvnZ54GpoROyM2IA== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=nvidia.com; dmarc=pass action=none header.from=nvidia.com; dkim=pass header.d=nvidia.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=wbFKAIJspVh1qMcoK8UQtXMzBLzXGAqUVEkeZCSfPtc=; b=XMcfARU7gswRvW86vz41R4/IdFa6tg8oAQKi/4v/BE6rEskL/rxdN27BjlBxQcVl/aHTZJq+KCY/sTRhBqWnAwfQIpdO7q361/ITPtAKslQjUQGZuqSVixDB9XJhr5yJPmPcEZBBXYPYnHuqZg0u+AAjtCJLuzyGO/JpuT9me3dtH681SWqpvMy0tTu1P99z5pxBt0iz2/bIAHbCgcaCshZLIElHkyhjGk/hQjvVMZ/IpyMhNaT9keHxW9WB0hBHOFZW4JKWlKmYnSHU3ibqH95RgSvSczxvD00EKEhNgAzC5hYK/mxnOhEnd2vMuicC109VaWDCnK/IC42aCKt4gQ== Received: from DS7PR12MB9473.namprd12.prod.outlook.com (2603:10b6:8:252::5) by PH7PR12MB7283.namprd12.prod.outlook.com (2603:10b6:510:20a::21) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.8335.10; Fri, 10 Jan 2025 19:51:04 +0000 Received: from DS7PR12MB9473.namprd12.prod.outlook.com ([fe80::5189:ecec:d84a:133a]) by DS7PR12MB9473.namprd12.prod.outlook.com ([fe80::5189:ecec:d84a:133a%3]) with mapi id 15.20.8335.011; Fri, 10 Jan 2025 19:51:03 +0000 From: Zi Yan To: Shivank Garg Cc: , David Rientjes , Aneesh Kumar , David Hildenbrand , John Hubbard , Kirill Shutemov , Matthew Wilcox , Mel Gorman , "Rao, Bharata Bhasker" , Rik van Riel , RaghavendraKT , Wei Xu , Suyeon Lee , Lei Chen , "Shukla, Santosh" , "Grimm, Jon" , , , Liam Howlett , Gregory Price , "Huang, Ying" Subject: Re: [RFC PATCH 0/5] Accelerate page migration with batching and multi threads Date: Fri, 10 Jan 2025 14:51:01 -0500 X-Mailer: MailMate (2.0r6203) Message-ID: <334B7551-7834-44E7-91E6-4AE4C0B382AF@nvidia.com> In-Reply-To: References: <20250103172419.4148674-1-ziy@nvidia.com> <600a57ff-a462-4997-a621-f919c2c4fa84@amd.com> <567FDE63-E84E-4B1E-85F4-4E1EB0C2CD26@nvidia.com> <003b0818-a35e-429c-9408-5e7344e981f2@amd.com> <8E1D6790-8A44-48C2-9FA5-66C7AB6CE531@nvidia.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-ClientProxiedBy: MN2PR01CA0011.prod.exchangelabs.com (2603:10b6:208:10c::24) To DS7PR12MB9473.namprd12.prod.outlook.com (2603:10b6:8:252::5) MIME-Version: 1.0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: DS7PR12MB9473:EE_|PH7PR12MB7283:EE_ X-MS-Office365-Filtering-Correlation-Id: ed8ebe96-411b-4c45-3acd-08dd31b01da4 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0;ARA:13230040|376014|7416014|366016|1800799024; X-Microsoft-Antispam-Message-Info: =?utf-8?B?cWM2RkV3d29jNThlRXN5RGo5djNvTU50S0xVUVp6YnhrYzZTc2VIMWsxY3R5?= =?utf-8?B?WWtaZmxJL3lCK2thdUxWV3c5dG5kN21kVE9BbDNyY2t0Q1VuY3NrQ3RHbTdw?= =?utf-8?B?K0g4bFJVak1YcjFrQTZZOHJXeVJMTENFQVBzUVlLM3k2RVRpL1BVSmw3d0ZH?= =?utf-8?B?MDZiRW53d0Q2RDVvUFdOTDkrVWxDSG05M0FOUU5rVWVTeXRhWFdxMDNSRGZm?= =?utf-8?B?UngyT1E1c2pqV2c1ZDJSbGkzd1Z0VUpUUHNMNU5EaDAzNkk0cHZtZ2t3WmdD?= =?utf-8?B?OFNCVW1ibWNGbVVnQXFtb052WVFwWjcrNVZxR3kzMWg3ZTd2NSswcWRKQ3NB?= =?utf-8?B?VjZ4akpkWXZUaUhueEZoMHVmbDE1U1ZYcURsUUh4U013TE5rWXpwZ0Y4VGZq?= =?utf-8?B?SUw2L0VidjJhMVZIYitIaFBXajl2MCtIR2NxWjkzRk9Teng0SVRiWXBzZVkx?= =?utf-8?B?bmlwWFc4aWo4czRnQXBlRS9KRmNjYUg1Qy9ucTh6UHVuZjdxV054VXBoc2tz?= =?utf-8?B?VzNubUFrYVVWYW4vNHNnT3FvSlBBTE1qVFdaaGovZmJCekd5aVkwbVJLWmRp?= =?utf-8?B?Y2xkdnhFQWZEQXVMdGdDdnRqQ3M0WmFHa3VxT0Q5VmE5MzdnTzcrdEV5eS9z?= =?utf-8?B?ejFyKys4VzVHWEo1UnR2RVA0UmRzYkFmQ3Z4UUpIbWVVWHIyQlBhblNDRWZZ?= =?utf-8?B?ZjJ4NzIzMzVGVGlhQytDYnRyVDBZYU81LzZWUGwwZGxjdjRBc214Z0NWU3JJ?= =?utf-8?B?Z2JnQnY4TkZRZWRUQzJqc05PWWpSREJSN0RidnNsREJHejd4b2FpZG1qZWgy?= =?utf-8?B?dUl5eG90a2pHSXBNK2JVL1VhYS96eE5qSE1YL29xeHZvNk5vcVprbUZIWmw3?= =?utf-8?B?NWU5WUlPVjlmTFM1YnhRZ3VGWlF1Vkh3REJwSmdzR0VlcE9Cc0RxaHRINnF3?= =?utf-8?B?V2FEMkUzSE9xeVhlVlBsQmpVU1crb0toVy9zdWtTNm5PL2dtVGFSOHZOTjZV?= =?utf-8?B?QzFTaXVkZWlIRmwzMVVJMUtPczllR090NGxCNVEzakFaS3VUb29uL3llTEtn?= =?utf-8?B?S3Z5NDNzbjluMno0b1Vsc3ZiK1hzMzMrWjkwRFIzeTNvSlJFNTRLNnU0VVI1?= =?utf-8?B?eXdGb010TnEyYlNUOTFQZkZBRWJwbndhN1BtcFVCelllL21mSm10MnRCTUpI?= =?utf-8?B?YncwdUtnekhqN0Fqdi9hSFF5WWIyWktnaW0xNjY0UVd5QlVCcnBTTUJMUXhh?= =?utf-8?B?WGFBTG96bXNTWDFmOWR6WS9TbUVqMDNIcDhlUjZuUyt5THEzZGYwOU9QTWdK?= =?utf-8?B?Qy80UEhQUzNhcVdzaGd6MHB1QVNhK1AxMkRnM1BQVVVHMWZYamZsRk9MRWJR?= =?utf-8?B?cGNlY3lHVVUzM3RXbnQ0M2lxSWhna2N2RUY1SUltNGN3SjRFaWxxSXRzaFRU?= =?utf-8?B?OUdIcDhUcElreE5YQkxDZkRqcTYvTGxNd0hEMFRlaHdGNWxWUU84d3VZZWRu?= =?utf-8?B?N2RPQ2gwK3p0UzRuai9VYkdhTXhzZnc2cXdHMDQvS0l0UmRLUlp5MDFYS0k2?= =?utf-8?B?aDUzaHNNU3BPMUxMRklhMlNjVTgwYlZvTkVXTzhYRzlBR2tmTUMzbDJJVTdp?= =?utf-8?B?SzhtTU1OeGd6UGMvMCtwNTJ0cFRCTndiZXFEemdhQTVKQnZLdVI4RnRSbE1I?= =?utf-8?B?cHFQMEV1STlLcUpPaTY4YkV0NUR6aktNc0I3Um1RZmJhNFBaQ1g2WUNHRXpy?= =?utf-8?B?a09OS0tadWF6SVhadSswZWNYdDRKVWVGRDNRR0w3TDFxcHRJcjcyaFhGOU9h?= =?utf-8?B?MGY0eWpUcEZkbGRuVlRMTjFPOGpGaU5mYlZraTZMM1JMZ05DZ3I0RENkcXd3?= =?utf-8?Q?l6SAWSEZOOgHB?= X-Forefront-Antispam-Report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:DS7PR12MB9473.namprd12.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230040)(376014)(7416014)(366016)(1800799024);DIR:OUT;SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: =?utf-8?B?bCt3ZHUwZEZpbmNicHY0MkpVazk5YUZObWl6cGJ5TnN6RDlaL3UvS2tHSWlD?= =?utf-8?B?d0tzaFRKQUlYTGlCSnBMQzhqOUNZVnl2WEVDa2lUVFNhTzIydzd6N05LZHdm?= =?utf-8?B?WUNvUTEzSzFyOEVZTXcrQ3Myam8zVEI1OHM4L0FrUWI1QjhNUHVUSWZHbHBK?= =?utf-8?B?Y0VtSGkzdlRqQVV4cFBGaXBkVzg2WCtQOHR5bUJNY2NEQU1la0ttdm5NYWdo?= =?utf-8?B?N1p4MlpHSGtnQVBTcE1vTUEyZE4xYU9wVHFIZXZtOXF0VHQ0KzQ4ekt1bU9u?= =?utf-8?B?b1dFc1FkejJYSGNIVG1wem1QTVJHYUltNEdVd3A2Y0E3eEdybTZXNFJKTUts?= =?utf-8?B?Wk1aOXRpdDhnWTAvdmJ4K0QrdmEzYlVSNi95R2tCVjNianhNeUlyTWlXOUM0?= =?utf-8?B?TnM1VmN4aHZ6NEwrK2RUbzVuWHRHRzRGWjBoWGVzMUUrMUw2YjZGenMzaFN0?= =?utf-8?B?VWRLTFdmcmFMR2hiUXJ2R2VaQ1pEbHhzNDQ4SkdKanJJajNBSHNPT0V1VjhV?= =?utf-8?B?Z2FwbnNzek1heW1kU1VjbSsvZzlDUjE3VHBJM2VPTzNER3d5a3pjT2NncVVx?= =?utf-8?B?Z3cydjUvdDgyRGEyMnRHaGxGSzVra1VHQ3FVVDYxRDZJOGRYM3VNMUdPRUZT?= =?utf-8?B?YXhXWWdPRExQSkFnS1QzbTFqamoxYlk0R3JEbWticEFwR3dqL25LeUd1TEVs?= =?utf-8?B?WWlQeVcydXlxd3R5QzVNcGY3Nzl2ZUZodWM1dEJDTWQ2YTAwZm1WdTZlSUV3?= =?utf-8?B?SzM0RXNTWE9sRnZaZEhiaW1nT1dXVjl6S1J4MGIxdzB2Qk1LL2NicG1lUUZE?= =?utf-8?B?MTgxb3FZZnNGaTE4ZXJENHdyY2huNE5XQTdFRTlzemJVak1HMWxSd2xCMDdj?= =?utf-8?B?YTNHeGw0QzltK1YyUlFISTgxZHlVa0VWWWtWNE9SVnZBZFhQdnV1QlArclJn?= =?utf-8?B?bzZmSzNuMSsveG1obU9GOS81RHFBRGxKdGQxdDJXSHBLYkVIc2RrUWZqSE9S?= =?utf-8?B?cmtCK0tXZ1VsNVl4N0FWdWFkWXNOZDN0U2Nab1ViMmpOZHJ4SUgyU0ZhVnJX?= =?utf-8?B?ck5xRmxlSU9jVFBwSUF1WDhWYkt0OFVmdEFFbzBtN1h1NkQvVUJYQ3JHNzBB?= =?utf-8?B?NUZ6YWxFUWlXWWZlWFUzK3JzN09BbkxDaW9JTmtFV09JMXFMSWJHYU8rekJG?= =?utf-8?B?K1hlTTkyVHRHNG9HcmVUNitIRnV3b1dBN3dhYk5Oc3dVV1ZXcWJkN0dFUjNU?= =?utf-8?B?am9ubEZ6czYvZ08vTUdiY0hlSzViVEkzQnYyMUtFNEZ0dnlKbnIwcnJqWk1T?= =?utf-8?B?OUVoUS9XS3FiSWQ2MHFjeWtRQXZFa3kvZDdzR2hyVWZZaENITGV6TTVNdllj?= =?utf-8?B?NFRYOWpOazJnRUVBOUpzWjBVT3VTVXc0T2gvODhKNXdQQkJqRjdmTmJxakQ5?= =?utf-8?B?a1NZRmc2RmczWS9TNzF2RmZBSUUvcnBnUFRjdndqMXYwTDFVTUNETTJtTTd4?= =?utf-8?B?TFBOdUUwY2ZtY053djNrN1Z5V3FYaU1YbDdDREVqL284VnZ6VzBkaHNuUHhr?= =?utf-8?B?cGdpbm9CTjFYa0o3NzRCdGthSnEzbHV4NCtReFRJWGMraE5zZnNSS0lDdVVI?= =?utf-8?B?dncvVGdkVkl6VEJubkNUZEJzRDZoZzBDZzZ3UFkyWHkzZUNaTlBFUE1INFFk?= =?utf-8?B?cEJBMVpCVFlrV1MxN1lmS24xcmtqZEFocjJBMkpGb1JnZGQ4NjBucU5RRFRE?= =?utf-8?B?VXVtWE1IRjVHY2t2eSt2K2lqeTQrcnFGTis1RlV1VHVpVjlWNmtDWDFPeTVu?= =?utf-8?B?Q1FvdVBuVlc4SjFYaFFlUVBGWW1JZDFzaTZ4WnJEb0JGalUzdzBoRnhOdGE2?= =?utf-8?B?MW40YVRVWThsK2tTOUdJMHgyeko4dHpvUVdYWTBRS3NXWEtoeVFGRTU0SjYv?= =?utf-8?B?Z3cveGRLOU1qZFRYTzhVVEdaeDc5THg5Y0dGNkhqSjNnR0FEV1BJbXhpaktT?= =?utf-8?B?ZUs3UCtPRFFsd3VDU2NUbmtpVUtkU3ZXaGlzMEI4MjVKdHZ1Yk0razdLK2Za?= =?utf-8?B?eDBlV0VmMW1xSGxrUUdLa09uZFQ5QTA2TjF6MVRSRUxFNU80TXhTd2RmU0pO?= =?utf-8?Q?Yh/SP/8H+XnrHJSsnO0DeZiRh?= X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-Network-Message-Id: ed8ebe96-411b-4c45-3acd-08dd31b01da4 X-MS-Exchange-CrossTenant-AuthSource: DS7PR12MB9473.namprd12.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 10 Jan 2025 19:51:03.8999 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: SmIsR4b/wncyV4gEHDEGCv1F5hOHRd5nLO5dOYgQIF8eJM7Uo8bxq2kByYxVQhsn X-MS-Exchange-Transport-CrossTenantHeadersStamped: PH7PR12MB7283 X-Rspamd-Queue-Id: C840EC0013 X-Stat-Signature: q5mex7t666b4bmrukwfa7ibu8zq1n3ck X-Rspam-User: X-Rspamd-Server: rspam11 X-HE-Tag: 1736538667-669641 X-HE-Meta: U2FsdGVkX18yt070RQ10IoN8o0ByHfe/RQnS/jszW9RtAj4bXS/Gt1June1YDXEsmpI1vKp9nlvyqWdHwvWW2x6npLu8FGweg+kUVoFSUC77/Y7+JmSTrNZLV3SISG158T62kcwdSvgal4nOdLHkyuVLFEoBuYE5VtlJYUZ6dqzarCEa0xyfPYUCs1ODWObW1gHybrPjPfLwACSfslU/WDWNd9bgo4MLE6X1QRgpWKkn8VJ5M7FBYrWjSlHRZEeDYPCNwVGriTA/pD33uTC6f5YvI1/lxYuLXjUUk2F75vZuWZ0iuup5LzuY4yx6vZmBIQJ/PChJ0H2JVIshBMUba7slHhKpTE8foawNK+aLhnDkhFCNC3SQfeCaZkmuIZSNQZ+OsFVrNpx1+k59RYjmyO42ij8VUA2ig7SpyD0ZkSgpDIDsfWv01bDtM/6xuKe5ajfOGs7OpxaHP8NPNfIBsFiBDY9M+wyOfdNNZNN8lTrnkEY2Q899UXiML37eCmbWk4nqgahEaKf4Dw2uY25JrADC11bbm9WQUvwUQJe4gyrPXaVLAj2GPVbUUh5xxlxdM9xceSROSdhiZws4OP5W1/bQn/lue7cKyJ+QrsaH8Yo0TdK48E+Os7/BAyBzc1qeouf/0aM28KLx1eWrvXgp4TbOy6qt2oWOYY8dMdhU3JgaUkvjOjE77C5lBmTQ+WSLSHJet/cmutexfQizXPuQs1cksGBH/p9Rf53kV/czzHyvbKALSZ8xe7iWOVbQkvhD/1UfUw6n549COD7+aguQ5zM9LLewDIja3xRoAY2zVw1fdoOXgNiUr9tjwYOh5fvFiSXhRDLEi8+FRSjSscGLBHFTJAArBRWQ1j63xxJM66FrrdTka2F42q2g/2YAkZ8GBlXTWJ3E7oVUyxAyVnpsno5r0OM/KB4Uf21nfdsQ88f40ldTYYCvWMHTTXbQ1rX+etpG4RS2QAisWTnK7Py UVkycBW9 9GhrUs7OnWk26AFy/SqqP/pen+oAHham9xcZsYgmHt4t71ae/JwInlsb/+c9iNmOa13sj0h86AoC93noYchVXYuIbPVwpNvO/A+ugxi4ewr9nv3EVUAwZHdbh8aBHP9y3pn/EMfqdTIsMKe+jL/1SeqKPjb/Oj5uuKi+D3l8NHMoyjSPmPHgR7u9inLGTJon496gSqm3INPxkhYTV340hdrNKycz++Ue5bWO82Xuqz8w72NfUQ6jPyLjPYifobGFzHObr113lOnnewLgm09dVuYC5Gu8V7wlvGJoH0lj9hq4uYZbDplqz7p9XHcC5JIjv6t7UV52VM74zWUhj/Oi9GElMsF9tXRXdBAyLmPcUEHAV281WcwZXCGFqSImDi3zq4xyj3fVxOrWzhOSF85Ikbsfa+nG/c/Ke5xKbsWsQiXu3bW5YycO2qkFEb/esLMPAGODufl15Xvo+tYDwArtKa7Pjlb2ZZAVTlbA3Rf4kaNNwjXexCYf4IMwp5eSYwBnM5Gs3 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 10 Jan 2025, at 12:05, Zi Yan wrote: > >>> >>>>> main() { >>>>> ... >>>>> >>>>> // code snippet to measure throughput >>>>> clock_gettime(CLOCK_MONOTONIC, &t1); >>>>> retcode =3D move_pages(getpid(), num_pages, pages, nodesArray , s= tatusArray, MPOL_MF_MOVE); >>>>> clock_gettime(CLOCK_MONOTONIC, &t2); >>>>> >>>>> // tput =3D num_pages*PAGE_SIZE/(t2-t1) >>>>> >>>>> ... >>>>> } >>>>> >>>>> >>>>> Measurements: >>>>> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D >>>>> vanilla: base kernel without patchset >>>>> mt:0 =3D MT kernel with use_mt_copy=3D0 >>>>> mt:1..mt:32 =3D MT kernel with use_mt_copy=3D1 and thread cnt =3D 1,2= ,...,32 >>>>> >>>>> Measured for both configuration push_0_pull_1=3D0 and push_0_pull_1= =3D1 and >>>>> for 4KB migration and THP migration. >>>>> >>>>> -------------------- >>>>> #1 push_0_pull_1 =3D 0 (src node CPUs are used) >>>>> >>>>> #1.1 THP=3DNever, 4KB (GB/s): >>>>> nr_pages vanilla mt:0 mt:1 mt:2 mt:4 = mt:8 mt:16 mt:32 >>>>> 512 1.28 1.28 1.92 1.80 2.24 = 2.35 2.22 2.17 >>>>> 4096 2.40 2.40 2.51 2.58 2.83 = 2.72 2.99 3.25 >>>>> 8192 3.18 2.88 2.83 2.69 3.49 = 3.46 3.57 3.80 >>>>> 16348 3.17 2.94 2.96 3.17 3.63 = 3.68 4.06 4.15 >>>>> >>>>> #1.2 THP=3DAlways, 2MB (GB/s): >>>>> nr_pages vanilla mt:0 mt:1 mt:2 mt:4 = mt:8 mt:16 mt:32 >>>>> 512 4.31 5.02 3.39 3.40 3.33 = 3.51 3.91 4.03 >>>>> 1024 7.13 4.49 3.58 3.56 3.91 = 3.87 4.39 4.57 >>>>> 2048 5.26 6.47 3.91 4.00 3.71 = 3.85 4.97 6.83 >>>>> 4096 9.93 7.77 4.58 3.79 3.93 = 3.53 6.41 4.77 >>>>> 8192 6.47 6.33 4.37 4.67 4.52 = 4.39 5.30 5.37 >>>>> 16348 7.66 8.00 5.20 5.22 5.24 = 5.28 6.41 7.02 >>>>> 32768 8.56 8.62 6.34 6.20 6.20 = 6.19 7.18 8.10 >>>>> 65536 9.41 9.40 7.14 7.15 7.15 = 7.19 7.96 8.89 >>>>> 262144 10.17 10.19 7.26 7.90 7.98 = 8.05 9.46 10.30 >>>>> 524288 10.40 9.95 7.25 7.93 8.02 = 8.76 9.55 10.30 >>>>> >>>>> -------------------- >>>>> #2 push_0_pull_1 =3D 1 (dst node CPUs are used): >>>>> >>>>> #2.1 THP=3DNever 4KB (GB/s): >>>>> nr_pages vanilla mt:0 mt:1 mt:2 mt:4 = mt:8 mt:16 mt:32 >>>>> 512 1.28 1.36 2.01 2.74 2.33 = 2.31 2.53 2.96 >>>>> 4096 2.40 2.84 2.94 3.04 3.40 = 3.23 3.31 4.16 >>>>> 8192 3.18 3.27 3.34 3.94 3.77 = 3.68 4.23 4.76 >>>>> 16348 3.17 3.42 3.66 3.21 3.82 = 4.40 4.76 4.89 >>>>> >>>>> #2.2 THP=3DAlways 2MB (GB/s): >>>>> nr_pages vanilla mt:0 mt:1 mt:2 mt:4 = mt:8 mt:16 mt:32 >>>>> 512 4.31 5.91 4.03 3.73 4.26 = 4.13 4.78 3.44 >>>>> 1024 7.13 6.83 4.60 5.13 5.03 = 5.19 5.94 7.25 >>>>> 2048 5.26 7.09 5.20 5.69 5.83 = 5.73 6.85 8.13 >>>>> 4096 9.93 9.31 4.90 4.82 4.82 = 5.26 8.46 8.52 >>>>> 8192 6.47 7.63 5.66 5.85 5.75 = 6.14 7.45 8.63 >>>>> 16348 7.66 10.00 6.35 6.54 6.66 = 6.99 8.18 10.21 >>>>> 32768 8.56 9.78 7.06 7.41 7.76 = 9.02 9.55 11.92 >>>>> 65536 9.41 10.00 8.19 9.20 9.32 = 8.68 11.00 13.31 >>>>> 262144 10.17 11.17 9.01 9.96 9.99 = 10.00 11.70 14.27 >>>>> 524288 10.40 11.38 9.07 9.98 10.01 = 10.09 11.95 14.48 >>>>> >>>>> Note: >>>>> 1. For THP =3D Never: I'm doing for 16X pages to keep total size same= for your >>>>> experiment with 64KB pagesize) >>>>> 2. For THP =3D Always: nr_pages =3D Number of 4KB pages moved. >>>>> nr_pages=3D512 =3D> 512 4KB pages =3D> 1 2MB page) >>>>> >>>>> >>>>> I'm seeing little (1.5X in some cases) to no benefits. The performanc= e scaling is >>>>> relatively flat across thread counts. >>>>> >>>>> Is it possible I'm missing something in my testing? >>>>> >>>>> Could the base page size difference (4KB vs 64KB) be playing a role i= n >>>>> the scaling behavior? How the performance varies with 4KB pages on yo= ur system? >>>>> >>>>> I'd be happy to work with you on investigating this differences. >>>>> Let me know if you'd like any additional test data or if there are sp= ecific >>>>> configurations I should try. >>>> >>>> The results surprises me, since I was able to achieve ~9GB/s when migr= ating >>>> 16 2MB THPs with 16 threads on a two socket system with Xeon E5-2650 v= 3 @ 2.30GHz >>>> (a 19.2GB/s bandwidth QPI link between two sockets) back in 2019[1]. >>>> These are 10-year-old Haswell CPUs. And your results above show that E= PYC 5 can >>>> only achieve ~4GB/s when migrating 512 2MB THPs with 16 threads. It ju= st does >>>> not make sense. >>>> >>>> One thing you might want to try is to set init_on_alloc=3D0 in your bo= ot >>>> parameters to use folio_zero_user() instead of GFP_ZERO to zero pages.= That >>>> might reduce the time spent on page zeros. >>>> >>>> I am also going to rerun the experiments locally on x86_64 boxes to se= e if your >>>> results can be replicated. >>>> >>>> Thank you for the review and running these experiments. I really appre= ciate >>>> it.> >>>> >>>> [1] https://lore.kernel.org/linux-mm/20190404020046.32741-1-zi.yan@sen= t.com/ >>>> >>> >>> Using init_on_alloc=3D0 gave significant performance gain over the last= experiment >>> but I'm still missing the performance scaling you observed. >> >> It might be the difference between x86 and ARM64, but I am not 100% sure= . >> Based on your data below, 2 or 4 threads seem to the sweep spot for >> the multi-threaded method on AMD CPUs. BTW, what is the bandwidth betwee= n >> two sockets in your system? From Figure 10 in [1], I see the InfiniteBan= d >> between two AMD EPYC 7601 @ 2.2GHz was measured at ~12GB/s unidirectiona= l, >> ~25GB/s bidirectional. I wonder if your results below are cross-socket >> link bandwidth limited. >> >> From my results, NVIDIA Grace CPU can achieve high copy throughput >> with more threads between two sockets, maybe part of the reason is that >> its cross-socket link theoretical bandwidth is 900GB/s bidirectional. > > I talked to my colleague about this and he mentioned about CCD architectu= re > on AMD CPUs. IIUC, one or two cores from one CCD can already saturate > the CCD=E2=80=99s outgoing bandwidth and all CPUs are enumerated from one= CCD to > another. This means my naive scheduling algorithm, which use CPUs from > 0 to N threads, uses all cores from one CDD first, then move to another > CCD. It is not able to saturate the cross-socket bandwidth. Does it make > sense to you? > > If yes, can you please change the my cpu selection code in mm/copy_pages.= c: > > + /* TODO: need a better cpu selection method */ > + for_each_cpu(cpu, per_node_cpumask) { > + if (i >=3D total_mt_num) > + break; > + cpu_id_list[i] =3D cpu; > + ++i; > + } > > to select CPUs from as many CCDs as possible and rerun the tests. > That might boost the page migration throughput on AMD CPUs more. > > Thanks. > >>> >>> THP Never >>> nr_pages vanilla mt:0 mt:1 mt:2 mt:4 m= t:8 mt:16 mt:32 >>> 512 1.40 1.43 2.79 3.48 3.63 3= .73 3.63 3.57 >>> 4096 2.54 3.32 3.18 4.65 4.83 5= .11 5.39 5.78 >>> 8192 3.35 4.40 4.39 4.71 3.63 5= .04 5.33 6.00 >>> 16348 3.76 4.50 4.44 5.33 5.41 5= .41 6.47 6.41 >>> >>> THP Always >>> nr_pages vanilla mt:0 mt:1 mt:2 mt:4 m= t:8 mt:16 mt:32 >>> 512 5.21 5.47 5.77 6.92 3.71 2= .75 7.54 7.44 >>> 1024 6.10 7.65 8.12 8.41 8.87 8= .55 9.13 11.36 >>> 2048 6.39 6.66 9.58 8.92 10.75 1= 2.99 13.33 12.23 >>> 4096 7.33 10.85 8.22 13.57 11.43 1= 0.93 12.53 16.86 >>> 8192 7.26 7.46 8.88 11.82 10.55 1= 0.94 13.27 14.11 >>> 16348 9.07 8.53 11.82 14.89 12.97 1= 3.22 16.14 18.10 >>> 32768 10.45 10.55 11.79 19.19 16.85 1= 7.56 20.58 26.57 >>> 65536 11.00 11.12 13.25 18.27 16.18 1= 6.11 19.61 27.73 >>> 262144 12.37 12.40 15.65 20.00 19.25 1= 9.38 22.60 31.95 >>> 524288 12.44 12.33 15.66 19.78 19.06 1= 8.96 23.31 32.29 >> >> [1] https://www.dell.com/support/kbdoc/en-us/000143393/amd-epyc-stream-h= pl-infiniband-and-wrf-performance-study BTW, I rerun the experiments on a two socket Xeon E5-2650 v4 @ 2.20GHz syst= em with pull method. The 4KB is not very impressive, at most 60% more throughput, but 2MB can ge= t ~6.5x of vanilla kernel throughput using 8 or 16 threads. 4KB (GB/s) | ---- | ------- | ---- | ---- | ---- | ---- | ----- | | | vanilla | mt_1 | mt_2 | mt_4 | mt_8 | mt_16 | | ---- | ------- | ---- | ---- | ---- | ---- | ----- | | 512 | 1.12 | 1.19 | 1.20 | 1.26 | 1.27 | 1.35 | | 768 | 1.29 | 1.14 | 1.28 | 1.40 | 1.39 | 1.46 | | 1024 | 1.19 | 1.25 | 1.34 | 1.51 | 1.52 | 1.53 | | 2048 | 1.14 | 1.12 | 1.44 | 1.61 | 1.73 | 1.71 | | 4096 | 1.09 | 1.14 | 1.46 | 1.64 | 1.81 | 1.78 | 2MB (GB/s) | ---- | ------- | ---- | ---- | ----- | ----- | ----- | | | vanilla | mt_1 | mt_2 | mt_4 | mt_8 | mt_16 | | ---- | ------- | ---- | ---- | ----- | ----- | ----- | | 1 | 2.03 | 2.21 | 2.69 | 2.93 | 3.17 | 3.14 | | 2 | 2.28 | 2.13 | 3.54 | 4.50 | 4.72 | 4.72 | | 4 | 2.92 | 2.93 | 4.44 | 6.50 | 7.24 | 7.06 | | 8 | 2.29 | 2.37 | 3.21 | 6.86 | 8.83 | 8.44 | | 16 | 2.10 | 2.09 | 4.57 | 8.06 | 8.32 | 9.70 | | 32 | 2.22 | 2.21 | 4.43 | 8.96 | 9.37 | 11.54 | | 64 | 2.35 | 2.35 | 3.15 | 7.77 | 10.77 | 13.61 | | 128 | 2.48 | 2.53 | 5.12 | 8.18 | 11.01 | 15.62 | | 256 | 2.55 | 2.53 | 5.44 | 8.25 | 12.73 | 16.49 | | 512 | 2.61 | 2.52 | 5.73 | 11.26 | 17.18 | 16.97 | | 768 | 2.55 | 2.53 | 5.90 | 11.41 | 14.86 | 17.15 | | 1024 | 2.56 | 2.52 | 5.99 | 11.46 | 16.77 | 17.25 | Best Regards, Yan, Zi