From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E5116E77188 for ; Fri, 10 Jan 2025 17:05:41 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 7E06E6B00A2; Fri, 10 Jan 2025 12:05:41 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 7906C6B00A5; Fri, 10 Jan 2025 12:05:41 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5E3876B00A7; Fri, 10 Jan 2025 12:05:41 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id E86556B00A2 for ; Fri, 10 Jan 2025 12:05:40 -0500 (EST) Received: from smtpin06.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 2F3C51C7C5F for ; Fri, 10 Jan 2025 17:05:40 +0000 (UTC) X-FDA: 82992168840.06.5EC3E85 Received: from NAM11-DM6-obe.outbound.protection.outlook.com (mail-dm6nam11on2057.outbound.protection.outlook.com [40.107.223.57]) by imf11.hostedemail.com (Postfix) with ESMTP id 0C61D4000F for ; Fri, 10 Jan 2025 17:05:36 +0000 (UTC) Authentication-Results: imf11.hostedemail.com; dkim=pass header.d=Nvidia.com header.s=selector2 header.b=tYv3Ejk0; dmarc=pass (policy=reject) header.from=nvidia.com; spf=pass (imf11.hostedemail.com: domain of ziy@nvidia.com designates 40.107.223.57 as permitted sender) smtp.mailfrom=ziy@nvidia.com; arc=pass ("microsoft.com:s=arcselector10001:i=1") ARC-Seal: i=2; s=arc-20220608; d=hostedemail.com; t=1736528737; a=rsa-sha256; cv=pass; b=YZwSXKm5VeLSXs0xMpluH56ND/6zPH2fBlzIbVk6Q9cJ/y5/8YjG3rW5znAaEkwKTG3oc0 VpbHL5FBwhML9c1ur40TY3UtpYuHOOGPsBoo257AjGaxgHXr5p/mekkXNPdZnTqhhnHFKs IYQN55IyAxVtS+KfwtgqIbNCyQk44cI= ARC-Authentication-Results: i=2; imf11.hostedemail.com; dkim=pass header.d=Nvidia.com header.s=selector2 header.b=tYv3Ejk0; dmarc=pass (policy=reject) header.from=nvidia.com; spf=pass (imf11.hostedemail.com: domain of ziy@nvidia.com designates 40.107.223.57 as permitted sender) smtp.mailfrom=ziy@nvidia.com; arc=pass ("microsoft.com:s=arcselector10001:i=1") ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1736528737; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=ndWO/qfkFQbBXerlw95OoqHvQXvzfLeRkpIdiCZT8YA=; b=3GFxlztB/rSSWV+PJGB8Ln9NY3hcBNE3+Fv3c69hClUHAqrJCgpGF1Ow8zvgZB/9h3PnSI SrzaG8EKsq4AEsAWwBvdz5FrrpecoIFttO9vRrVPld91NvIjH8KXADJ7Oi/kB1Xwjk7WSs tvGc4YXEW8QWANkMTl35sKCjiiP99kU= ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=PDLZdwykTxfAsetDdY5crzTnTC9zb5W6u2iqGOno26fehLkfa/8WqwS4L43UFDrWxwVAkPiJICnsqKZt1GCOlh05UMR3Fei/i31Cf4v4TTcxp7YKb4uGSThAspGX01h/TbAYEy8z0OneZBD9tm3L3dA54kj7JySgFRi60y3Gd+gWWhj51qK7jpvU2D0g9qP6Af2LhawKCKtxMsN5cjzh/jSyPaDkZXQnXuqPU8P7ejah2dSbshxaTOPZGdUIUFehSmy66PyuRePRZmHygMfIIud7kwcA3WIC42kQ5kOOIZR3sGL2ZZ3E02XbA6XwGRyJNrUqIineJ4qf5OJ/4iJxWQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=ndWO/qfkFQbBXerlw95OoqHvQXvzfLeRkpIdiCZT8YA=; b=YAHN0J0DpEOxDUWKPRaR4piN8E8bkJ0rfBiePITPBd3FDuICrcR3zpBjPhMN3PIQBlGE042rmuRPQawgmXk1n/pnZUH7kDkntWxfaQRJFSf8tJNsDD3+oG432hzuQD9R77OG2K4/qQgxy5Yf3WMEwXB9wVP1BvZ3ZSv7fAtA62UesRPab9lcpRRq9v1VI1YlzCyvegpw8J5/O77O2+yqtLaY5dr3L44pT7RUNE+Txy6SB1yhBJOB5Pz/BL42VlIc1ft9TpWBKOqSy+8qJBy9H0GAS4CIk6GcFlDFVntVrKdrhkM8kvaT8JPibrKpNcgEwN2J4N3tVgrHIW8bz9ke0A== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=nvidia.com; dmarc=pass action=none header.from=nvidia.com; dkim=pass header.d=nvidia.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=ndWO/qfkFQbBXerlw95OoqHvQXvzfLeRkpIdiCZT8YA=; b=tYv3Ejk0NGu8KR0JFf1ow1TH4QlbndtGsDxqHBM3l93M7r7NmQ6itwZLsotg0fgBJfvvwfXuFKUwuOVehdWEfDZFUWcS835iujf2a3bOMhKmsP/JOvMRreTPq3nclRQmvf9qxP3OkDIi/+AuTi/eq4bnciNmWhaVQbTasGocNcfO+ueQOuC4hcQYLnM4qWsT/E9uYs4JeNNSOOU2dqBvJp6MjoCQ7nHQWXP14k3rImhTCpit6AGgKBghiupyuyIOI1v9iCCddB2O5f6Hws+vLfFecvFSC+LQpV4bICI5n5/jnV5CVwznnOthA+o3pr3M4VCKMumI/+2kA3oQOd9nfQ== Received: from DS7PR12MB9473.namprd12.prod.outlook.com (2603:10b6:8:252::5) by DM6PR12MB4202.namprd12.prod.outlook.com (2603:10b6:5:219::22) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.8335.12; Fri, 10 Jan 2025 17:05:34 +0000 Received: from DS7PR12MB9473.namprd12.prod.outlook.com ([fe80::5189:ecec:d84a:133a]) by DS7PR12MB9473.namprd12.prod.outlook.com ([fe80::5189:ecec:d84a:133a%3]) with mapi id 15.20.8335.011; Fri, 10 Jan 2025 17:05:34 +0000 From: Zi Yan To: Shivank Garg Cc: , David Rientjes , Aneesh Kumar , David Hildenbrand , John Hubbard , Kirill Shutemov , Matthew Wilcox , Mel Gorman , "Rao, Bharata Bhasker" , Rik van Riel , RaghavendraKT , Wei Xu , Suyeon Lee , Lei Chen , "Shukla, Santosh" , "Grimm, Jon" , , , Liam Howlett , Gregory Price , "Huang, Ying" Subject: Re: [RFC PATCH 0/5] Accelerate page migration with batching and multi threads Date: Fri, 10 Jan 2025 12:05:29 -0500 X-Mailer: MailMate (2.0r6203) Message-ID: In-Reply-To: <8E1D6790-8A44-48C2-9FA5-66C7AB6CE531@nvidia.com> References: <20250103172419.4148674-1-ziy@nvidia.com> <600a57ff-a462-4997-a621-f919c2c4fa84@amd.com> <567FDE63-E84E-4B1E-85F4-4E1EB0C2CD26@nvidia.com> <003b0818-a35e-429c-9408-5e7344e981f2@amd.com> <8E1D6790-8A44-48C2-9FA5-66C7AB6CE531@nvidia.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-ClientProxiedBy: BL1PR13CA0235.namprd13.prod.outlook.com (2603:10b6:208:2bf::30) To DS7PR12MB9473.namprd12.prod.outlook.com (2603:10b6:8:252::5) MIME-Version: 1.0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: DS7PR12MB9473:EE_|DM6PR12MB4202:EE_ X-MS-Office365-Filtering-Correlation-Id: 96a3d714-4cfe-476d-6891-08dd3198ff05 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0;ARA:13230040|376014|1800799024|7416014|366016; X-Microsoft-Antispam-Message-Info: =?utf-8?B?QVUwdFFPQXVmaitUejNCR2hwS2VLV3dhdE85RmFGME9QbVpHTEZib3NaRTFv?= =?utf-8?B?KzdmZ1BlZWdmTG94b2hyNmliQXdrUjI2Q1R5QjdiSlRyeU5XRTRrem5ZK0E0?= =?utf-8?B?b2ZFS0pZaG5zRmtVT2hab2tNUGVNbXZKR2x1N1lSR1dremZHZDBvaDlTNDM5?= =?utf-8?B?VjZjcVdQdC91SDYxTHFnbCtKeXZrWlBUTFR1TmNYU3FiZXh2d3pvUS9WNE1Y?= =?utf-8?B?aFRKOGE3M2Myd1YzNWtIblQwZWNocFhPVUV4SzB5cmROUGZTM1RRN216Mklm?= =?utf-8?B?VmppUnB2SVFiRXJ2Q0syMzlkZ2N4Y29BbzJLWnZ4RHc0T29SbHpMbXVWb1Ay?= =?utf-8?B?SXA4dlhaRlJFU1lWSDVXSkk3ZEp6cDRMeUhhOUV4Rjl3UUh3RUN6UkxPem5a?= =?utf-8?B?S1BhREVoUk54aHh1aURzRFREVEdLa2VOVlIzaU01M2wwVVYwdE9PTUxKNUZX?= =?utf-8?B?Skw4Q0Y3bGZjRE43L1ltaHFzU05qNFdTRXBKT3pacTJkS1BUT0QxZy94alJn?= =?utf-8?B?Q3kzKzRjMkxXbUVQbDlwOFhOb0pNR0t1cHh2REVNZkJua01GdjBIeWgzc2lW?= =?utf-8?B?Tzd3b2xqL1J1eWZnb1pVM0JBZ3ZkaGVBZ09ub1h6YlFIdXU5bkFaMktITnpK?= =?utf-8?B?VUYrNVdIZWFIQTQralRYUVhmWnFwZ1NQQUE1aUx4S3c1eVpWN0xNZkR1RjIr?= =?utf-8?B?NzBUU2VrZDExUEpwcTZNNzlhOUxWb1BEbUV1NHJGcW16TjVIWEd3TGRUcDNz?= =?utf-8?B?UlRmdVZsOGhOaUQxNzJlRitaQWpQYXJUOWpVSXZUcWtrV1Z6b3FnUlpjQXRo?= =?utf-8?B?NTJyTy9XOW1VazI1a0pQNnF6UWorRDFnS0o2VXlMZS8zR1ZXSmR4TUo1RzN5?= =?utf-8?B?OHlnZFZmL3J4bGdRdlV4WXR3eUZLOGpTaHBHbDR5OGV6UFBEQUVSRjJMZjZB?= =?utf-8?B?RWdOUlBQS0UwV3YyWFl3eTdCdnNOOXNPeXVoS29IcE0xNSt1eWk4cFFRa3Nv?= =?utf-8?B?V1BvR2pTdWR0VFRkM0FuVXVQQ2VDOVNsdzFLeUNwekZVQnpVdm93Q2RzUkh3?= =?utf-8?B?OS8wV3VVRkRrRGhrZDd3OGU1cTBPMDdmdTVVY0MrbU1ZT1RhYmRsY3NQQnd3?= =?utf-8?B?WG1Zejk4UGZEcHJFZjR4eGRsbGpMR1RyT3kxMDV1dUdQdFh4MWdZaDdaWjJ4?= =?utf-8?B?bGdLTlhTWGRZYVJFYUVkZ3M2V1M0bGhrT3l3c1ZVUWpxWnlJRUhiVWJTV3Ey?= =?utf-8?B?dDk0WjlQb3VUZHBVL2w5c0ZIZXNwbVp1Q3VPSVIrNTd2S0R4em54cnlWNG00?= =?utf-8?B?VzF6a3FWRDhpVFN1YXdtemJSY2h1T3Y5RFcvNVpuaTlqSTA5U2g3NE1wSVho?= =?utf-8?B?SDJMTkgvdHJ5VndoUEkyeXY4amo4VGdKZG1YMXYxVnplZXArNDRCV0tzMU9w?= =?utf-8?B?TCtzbjNpQUdPRGp0bCtiRWhQby9zVVY3L1ByUGpkNy8vK0RqT1pJRDh0QzJB?= =?utf-8?B?M1FYWkdQaXpGeWowMmE4bU12YmxkSk1tcUlOWEM4OWNPbDI4TmE5eStnN043?= =?utf-8?B?VnhsUFFSMU1HNEpUOU8xZ3c0eTJhZjhOdEMyOGswWmduYlh1a3VGQ2h2Mkpj?= =?utf-8?B?THZqaVFpT3FXVnA2aFIwYnlBZGdyTEgzNEVhc1kvTmxMZ0N2bnlpbkQwa1c4?= =?utf-8?B?MDZ3bVRvRWU5dks0c29KM3FSUlJuUW85S2YyUVUzKzErcEwwMXJISFJDTnVY?= =?utf-8?B?aEw0dWFGTERkS1dRNzE3WUFERkxQU2lRcGRjdnJXTUxYWks0Um9hUHc1NlJh?= =?utf-8?B?OWcwN3FLSXJOOG9DSHg0QU5nUXNPcmJvNHJTaUR2Q3l5YWFtTW9FUlJ1SnV6?= =?utf-8?Q?RLL1NnjpscMz/?= X-Forefront-Antispam-Report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:DS7PR12MB9473.namprd12.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230040)(376014)(1800799024)(7416014)(366016);DIR:OUT;SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: =?utf-8?B?TmZBb2dNaG5qbjVsU2dVZE84bHFTdnBCekw3cVRYa2dvdlZURWh0c3piL2dv?= =?utf-8?B?YXcxU0NvL01LS2hYTlZmZXVZaHIrWHVZY2IwenRXL2xKc2QyTjEvL0JmeFVi?= =?utf-8?B?OGRxRmQzR2FHWnRlRXdJQmIveEd3ZDN2Tmx5V1FsL0MrSkN0K3JjVXlWL2dO?= =?utf-8?B?WlJtdGg0NG5QUUY0VC9vanFxM2t2VXBDZmd0NmwvK0N5L1FMM2VDS0JuaU9t?= =?utf-8?B?dVZnbU00dFBGU05hVlVncHpRTEJXcVduT08yWUJPMDFRVXZxUUpkN3JOeFlF?= =?utf-8?B?ZVM5R3daN0J2N0d5RzA5NDZSZ0ordGExRmtLa3FkMTA1Nm5vZmRrblVKUHFW?= =?utf-8?B?NkpSTU1ONEFGRjVibno0YXpNNDNpN2s2T1ZmREZ4V1dvUktWTndFRjAvTmlE?= =?utf-8?B?U09RR0x0dnFLOW9jYUF2VGxBc2xEVFFjYnRLa20wU09UZWhUbUt4TmJEc0ZP?= =?utf-8?B?WVhuNjlLUmJUQkZybzZNa0QyNG9SZFZRcUVzbXFyenFuaXJiU3NCZDA3OE53?= =?utf-8?B?dkZIMEVlZkZBWnhpMXZuNlMzY252NkR1Wlg2REIyM1lFK3htYUFmWlBDNUNS?= =?utf-8?B?SkxVK1dZNCsvakJYdmgxdytDa2lENHR0TTZ2Lzl0RUVZMFpuWmxVdU5TRE5D?= =?utf-8?B?VlpERFZxTmVrWmN0UDRnRUxQc2FmVnlDUlpCVEJ3NkVmNU9EZlZaVHoyMkx1?= =?utf-8?B?cUttQVRiVDg1emdWbFUrekNkVUhrTmVtUWE0VEVQSEVXeW9vd1R6UU41ZlpW?= =?utf-8?B?R1hNOStBdFlTWVl2RmpFWEtlZTFFaFNJMTE4VmFZTGxDUTE4NG1FYnlWWFRN?= =?utf-8?B?aU9YL0lOVjIvSzMxUUdXNTRseW9yK2NKdm5TTUpTalVRMTNjOHlkUGJyMW4x?= =?utf-8?B?U2UxZHBUN3IvUExLUlNhdXNMZGlZOG5kd21LV3d5TzJKUll0RzQ3aU1YNkV0?= =?utf-8?B?dVhCOUxlYmVkL2w4WDdUbFM5NGxvRm5OUFJvRFJCVEFiY3BPcnlBSFJXZHA3?= =?utf-8?B?K0ZhdTBaVHF2TjZvbW43eEM0cWJhUDl5LzJiSmw0WlF1cmRSRHBxUFpEdWw2?= =?utf-8?B?UEZOa1FaUlNLc1FoSTJNdTY4UHVPK1FVYjJnYjJuZU9UZlNxYUJlNUxzZ09Y?= =?utf-8?B?bVkwNlNVQnRzaTA2YzQ1RG9DT0JpSzhxUWtzamJMTGFLZW9kY1F5dFNuY0c0?= =?utf-8?B?UlBlaStNNEo5OWJFaE9Rak9heG82WU8xQjFlUGgvczdXZW1LTjRWOFA2Wi9W?= =?utf-8?B?cDZKeTB3RlI3NXd1bjRpbmkrMVp3aFhTaDIvYTRKd1gzblcyOUJlNEV5c1dK?= =?utf-8?B?MHliNGc5VXJQSDR5RVk4WktITDdaM3Q2eStsM3J0N2hQb3hWWHlVWlBjTEk1?= =?utf-8?B?bDZjcDVCN2dKK0NDUVBaam5oRFY3aGxLbEdzRWRTWVViZlVDbFVIYUp1MHpv?= =?utf-8?B?c2FOZUNCQnloa3NuVGtGMVowMEhJcFpseHZ5NXF3MzNPdGVidUZTenBlUk8r?= =?utf-8?B?MXEvdkZPUlMvL01uZWNYZ1pUM2U2aE5kcDh4UGdocXh2ME5aL0crUmhwTFU5?= =?utf-8?B?K2k5Z0d6VG9GVUIyMTBVVm5VQXhDaTZLTTRQTlEwbDEyQ2E0WEVLUkxKUDND?= =?utf-8?B?UEtubjNpekE2eHZtRE5VRFpScDVuK0xnaTlhUVV6WG1sYlhHbHFjRXBnS21N?= =?utf-8?B?eWtTd0RDSHB0aGUzZ1JhTVZiVSs5NEUwdUpONzdpTXFXQk5BaE15Ujc3c1Zq?= =?utf-8?B?VkQ2aE1rS1hWenI4clBKekZiSmlGOWlqVUM5UTg1WHZLV1pkMW52QTh2cTBZ?= =?utf-8?B?R3RTTXZ0dVZxTFU0UXozbWhyeDM5QVd5YlJuRnhLL3lGdDRGcnk5S3hmeVlF?= =?utf-8?B?RXBCdnhORko1THdOODFySmhsMXd5ZTNkMHJOZm9Qa3VaNy93dlFja1h0ZlF4?= =?utf-8?B?NWR5d0FQeXNZTnNSU2QrdVF0QmxOS21jWElOMkR2Q3liRXJaWERRcEYzTnpy?= =?utf-8?B?TDgzSXBweDFLWVI4R3pRNUx2TnFlcVc1cFpBT2kvRmM3ZWllN09BbHdrblFs?= =?utf-8?B?STlwRFozQ3pVcWpQWWJxOTNuQU91Vkd4Smw5d1IzMUQzbmVBdG9KcElWaFRj?= =?utf-8?Q?n2l+bgfAJvm6DjcdPLuR8jVyo?= X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-Network-Message-Id: 96a3d714-4cfe-476d-6891-08dd3198ff05 X-MS-Exchange-CrossTenant-AuthSource: DS7PR12MB9473.namprd12.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 10 Jan 2025 17:05:34.0759 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: cg/U6MrxsG+1ZjNxkIPyVVsDpBL8OpgrqwVqSXmGGCcCBnqoTwYWmIoe2itVuzzI X-MS-Exchange-Transport-CrossTenantHeadersStamped: DM6PR12MB4202 X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 0C61D4000F X-Stat-Signature: u39dgsipg5cqskswyq7xtf7enkhixohy X-Rspam-User: X-HE-Tag: 1736528736-841020 X-HE-Meta: U2FsdGVkX19QZKnH7DQOfXWyFRZiqXDhaRkl5i/lAa+fCtzb/4bFsv087tD8g6IDxathnNtFzVyBuJHc1xr432U51JWATd4pc+jQJre8hi8fktkzhvO7Ycw0FgQWCYNxp61VNaUTZm0/XzeRFIEWzAOZKNSlwqr7dgTlFUa8RSclvjjUtk8MkWaTtaEJdQKQxpb3DIYDR0Wu8NHroD1TUxMpYCbGRC8jqc6sfNRVm66iUF4ubqmA6ckayWxWwffBTpp3e3n5C8zltGTPzn18slEQ7cFJ/BuQcnsiaVg6eDCh0mutehah055zT8xQNS4qBghTXigz1GThhu74uMlVlZwQV6nNrVaU6yWOuNvTn7/xK4brFQQcz1OhXqymUhMfd5G0Ab9m5PTvfMK4ZJnynd+KuH4qmnDK02cOELCbBOnIMUF+ti2WzbivWRnozwa+WH1LLN04X/BstD+kFwnysJ6En2NV82q6HbutPAjl5tSo9WoxxwheNFPvMtspDuICKuZCxEhSo8DPRaDxbUXHjXvOREs7aBUz+qDhLRRFyyfrpEFt8O/hCzja8PP2avlZViaNjArDvIf+halReO77FZaBL6AH1oJmL7ZK0yE1wq2c2xwZoVjsKNLN7wKWbKKzsX/5o80x3akuOJEArAg97ce9SpsZcYQkohUWtDdi83pVGo9RfxYaOwF+BpxRIgudTGIhxOHgDTwJ4lx+QoRhsKUr7wshRYKrkwMPd+Cdvl9rQzV3ls+Zn2G5CaaaOn1++1ONV+G0nvgK1boe90BFQjpCQ+xdK9e1jIzBm2I1W13DmTdiIDPiHVRowTO8cUvYc9yxgGrMxK5xXcalZLK/9XDrbSMyNSTuGaIdqcIw4mROFfeLQyFMRSijffaHtEI6WsB8sWDmqzbIJl1JpkcLFDdf2VA0GaLGDtW8E7SQ6k1yoqIFxzG7riL5Z+96orogSKLN987j6t8pCPz55H+ +ddokBcO hkOjMuYY7RA7xyrkfAi6XfvJaLfeaXw4ZOpeYlnlkmuTvIkoL/d7JZqBt2GD/AQPeuOrPXO0BZKj00+4Ykna4U5dUxUiRJNyKWbq7G5TPzFcRIeRnXmP8IL1WHty7h4Y2kHT9THyymrYYT9NJZ8ubJ27dCT9NlmRNIjxcJznc77rEVLavqESSJ8ivimh9i75Ohxc51vu4ah0JpLrPsWa3g5xxpVooWdBUPhiLMeRg33HPv4/b7kK/CoGuWH5kwUsJh6qzw1WVSGTXrh9fkGmwUr54aqDFivWtLOHH8uMVY6XtQGsijp0cr/EnRVGTxBr7qrEK196/uwhL6FniDD1EK0XuCuQRM5VYzCKkp6HIWerXFxS/MB3cPqP4Tlur+brglNAm35S1ZEujTQrLa4XPcb5O80g9LDvo07oIJ2f36hE6YUL0zn6NIYFhSWzDrGgd9akdH7On6daeQ9QY+cJLM6KdcN9Ce2hXFt08Lv0kooy13eiHcLl+keUKFwfGewc5y3Op X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: >> >>>> main() { >>>> ... >>>> >>>> // code snippet to measure throughput >>>> clock_gettime(CLOCK_MONOTONIC, &t1); >>>> retcode =3D move_pages(getpid(), num_pages, pages, nodesArray , st= atusArray, MPOL_MF_MOVE); >>>> clock_gettime(CLOCK_MONOTONIC, &t2); >>>> >>>> // tput =3D num_pages*PAGE_SIZE/(t2-t1) >>>> >>>> ... >>>> } >>>> >>>> >>>> Measurements: >>>> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D >>>> vanilla: base kernel without patchset >>>> mt:0 =3D MT kernel with use_mt_copy=3D0 >>>> mt:1..mt:32 =3D MT kernel with use_mt_copy=3D1 and thread cnt =3D 1,2,= ...,32 >>>> >>>> Measured for both configuration push_0_pull_1=3D0 and push_0_pull_1=3D= 1 and >>>> for 4KB migration and THP migration. >>>> >>>> -------------------- >>>> #1 push_0_pull_1 =3D 0 (src node CPUs are used) >>>> >>>> #1.1 THP=3DNever, 4KB (GB/s): >>>> nr_pages vanilla mt:0 mt:1 mt:2 mt:4 = mt:8 mt:16 mt:32 >>>> 512 1.28 1.28 1.92 1.80 2.24 = 2.35 2.22 2.17 >>>> 4096 2.40 2.40 2.51 2.58 2.83 = 2.72 2.99 3.25 >>>> 8192 3.18 2.88 2.83 2.69 3.49 = 3.46 3.57 3.80 >>>> 16348 3.17 2.94 2.96 3.17 3.63 = 3.68 4.06 4.15 >>>> >>>> #1.2 THP=3DAlways, 2MB (GB/s): >>>> nr_pages vanilla mt:0 mt:1 mt:2 mt:4 = mt:8 mt:16 mt:32 >>>> 512 4.31 5.02 3.39 3.40 3.33 = 3.51 3.91 4.03 >>>> 1024 7.13 4.49 3.58 3.56 3.91 = 3.87 4.39 4.57 >>>> 2048 5.26 6.47 3.91 4.00 3.71 = 3.85 4.97 6.83 >>>> 4096 9.93 7.77 4.58 3.79 3.93 = 3.53 6.41 4.77 >>>> 8192 6.47 6.33 4.37 4.67 4.52 = 4.39 5.30 5.37 >>>> 16348 7.66 8.00 5.20 5.22 5.24 = 5.28 6.41 7.02 >>>> 32768 8.56 8.62 6.34 6.20 6.20 = 6.19 7.18 8.10 >>>> 65536 9.41 9.40 7.14 7.15 7.15 = 7.19 7.96 8.89 >>>> 262144 10.17 10.19 7.26 7.90 7.98 = 8.05 9.46 10.30 >>>> 524288 10.40 9.95 7.25 7.93 8.02 = 8.76 9.55 10.30 >>>> >>>> -------------------- >>>> #2 push_0_pull_1 =3D 1 (dst node CPUs are used): >>>> >>>> #2.1 THP=3DNever 4KB (GB/s): >>>> nr_pages vanilla mt:0 mt:1 mt:2 mt:4 = mt:8 mt:16 mt:32 >>>> 512 1.28 1.36 2.01 2.74 2.33 = 2.31 2.53 2.96 >>>> 4096 2.40 2.84 2.94 3.04 3.40 = 3.23 3.31 4.16 >>>> 8192 3.18 3.27 3.34 3.94 3.77 = 3.68 4.23 4.76 >>>> 16348 3.17 3.42 3.66 3.21 3.82 = 4.40 4.76 4.89 >>>> >>>> #2.2 THP=3DAlways 2MB (GB/s): >>>> nr_pages vanilla mt:0 mt:1 mt:2 mt:4 = mt:8 mt:16 mt:32 >>>> 512 4.31 5.91 4.03 3.73 4.26 = 4.13 4.78 3.44 >>>> 1024 7.13 6.83 4.60 5.13 5.03 = 5.19 5.94 7.25 >>>> 2048 5.26 7.09 5.20 5.69 5.83 = 5.73 6.85 8.13 >>>> 4096 9.93 9.31 4.90 4.82 4.82 = 5.26 8.46 8.52 >>>> 8192 6.47 7.63 5.66 5.85 5.75 = 6.14 7.45 8.63 >>>> 16348 7.66 10.00 6.35 6.54 6.66 = 6.99 8.18 10.21 >>>> 32768 8.56 9.78 7.06 7.41 7.76 = 9.02 9.55 11.92 >>>> 65536 9.41 10.00 8.19 9.20 9.32 = 8.68 11.00 13.31 >>>> 262144 10.17 11.17 9.01 9.96 9.99 = 10.00 11.70 14.27 >>>> 524288 10.40 11.38 9.07 9.98 10.01 = 10.09 11.95 14.48 >>>> >>>> Note: >>>> 1. For THP =3D Never: I'm doing for 16X pages to keep total size same = for your >>>> experiment with 64KB pagesize) >>>> 2. For THP =3D Always: nr_pages =3D Number of 4KB pages moved. >>>> nr_pages=3D512 =3D> 512 4KB pages =3D> 1 2MB page) >>>> >>>> >>>> I'm seeing little (1.5X in some cases) to no benefits. The performance= scaling is >>>> relatively flat across thread counts. >>>> >>>> Is it possible I'm missing something in my testing? >>>> >>>> Could the base page size difference (4KB vs 64KB) be playing a role in >>>> the scaling behavior? How the performance varies with 4KB pages on you= r system? >>>> >>>> I'd be happy to work with you on investigating this differences. >>>> Let me know if you'd like any additional test data or if there are spe= cific >>>> configurations I should try. >>> >>> The results surprises me, since I was able to achieve ~9GB/s when migra= ting >>> 16 2MB THPs with 16 threads on a two socket system with Xeon E5-2650 v3= @ 2.30GHz >>> (a 19.2GB/s bandwidth QPI link between two sockets) back in 2019[1]. >>> These are 10-year-old Haswell CPUs. And your results above show that EP= YC 5 can >>> only achieve ~4GB/s when migrating 512 2MB THPs with 16 threads. It jus= t does >>> not make sense. >>> >>> One thing you might want to try is to set init_on_alloc=3D0 in your boo= t >>> parameters to use folio_zero_user() instead of GFP_ZERO to zero pages. = That >>> might reduce the time spent on page zeros. >>> >>> I am also going to rerun the experiments locally on x86_64 boxes to see= if your >>> results can be replicated. >>> >>> Thank you for the review and running these experiments. I really apprec= iate >>> it.> >>> >>> [1] https://lore.kernel.org/linux-mm/20190404020046.32741-1-zi.yan@sent= .com/ >>> >> >> Using init_on_alloc=3D0 gave significant performance gain over the last = experiment >> but I'm still missing the performance scaling you observed. > > It might be the difference between x86 and ARM64, but I am not 100% sure. > Based on your data below, 2 or 4 threads seem to the sweep spot for > the multi-threaded method on AMD CPUs. BTW, what is the bandwidth between > two sockets in your system? From Figure 10 in [1], I see the InfiniteBand > between two AMD EPYC 7601 @ 2.2GHz was measured at ~12GB/s unidirectional= , > ~25GB/s bidirectional. I wonder if your results below are cross-socket > link bandwidth limited. > > From my results, NVIDIA Grace CPU can achieve high copy throughput > with more threads between two sockets, maybe part of the reason is that > its cross-socket link theoretical bandwidth is 900GB/s bidirectional. I talked to my colleague about this and he mentioned about CCD architecture on AMD CPUs. IIUC, one or two cores from one CCD can already saturate the CCD=E2=80=99s outgoing bandwidth and all CPUs are enumerated from one C= CD to another. This means my naive scheduling algorithm, which use CPUs from 0 to N threads, uses all cores from one CDD first, then move to another CCD. It is not able to saturate the cross-socket bandwidth. Does it make sense to you? If yes, can you please change the my cpu selection code in mm/copy_pages.c: + /* TODO: need a better cpu selection method */ + for_each_cpu(cpu, per_node_cpumask) { + if (i >=3D total_mt_num) + break; + cpu_id_list[i] =3D cpu; + ++i; + } to select CPUs from as many CCDs as possible and rerun the tests. That might boost the page migration throughput on AMD CPUs more. Thanks. >> >> THP Never >> nr_pages vanilla mt:0 mt:1 mt:2 mt:4 mt= :8 mt:16 mt:32 >> 512 1.40 1.43 2.79 3.48 3.63 3.= 73 3.63 3.57 >> 4096 2.54 3.32 3.18 4.65 4.83 5.= 11 5.39 5.78 >> 8192 3.35 4.40 4.39 4.71 3.63 5.= 04 5.33 6.00 >> 16348 3.76 4.50 4.44 5.33 5.41 5.= 41 6.47 6.41 >> >> THP Always >> nr_pages vanilla mt:0 mt:1 mt:2 mt:4 mt= :8 mt:16 mt:32 >> 512 5.21 5.47 5.77 6.92 3.71 2.= 75 7.54 7.44 >> 1024 6.10 7.65 8.12 8.41 8.87 8.= 55 9.13 11.36 >> 2048 6.39 6.66 9.58 8.92 10.75 12= .99 13.33 12.23 >> 4096 7.33 10.85 8.22 13.57 11.43 10= .93 12.53 16.86 >> 8192 7.26 7.46 8.88 11.82 10.55 10= .94 13.27 14.11 >> 16348 9.07 8.53 11.82 14.89 12.97 13= .22 16.14 18.10 >> 32768 10.45 10.55 11.79 19.19 16.85 17= .56 20.58 26.57 >> 65536 11.00 11.12 13.25 18.27 16.18 16= .11 19.61 27.73 >> 262144 12.37 12.40 15.65 20.00 19.25 19= .38 22.60 31.95 >> 524288 12.44 12.33 15.66 19.78 19.06 18= .96 23.31 32.29 > > [1] https://www.dell.com/support/kbdoc/en-us/000143393/amd-epyc-stream-hp= l-infiniband-and-wrf-performance-study Best Regards, Yan, Zi