From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5E835C0218F for ; Fri, 31 Jan 2025 15:34:42 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 958BC6B0082; Fri, 31 Jan 2025 10:34:41 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 908B86B0083; Fri, 31 Jan 2025 10:34:41 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 75B0F6B0085; Fri, 31 Jan 2025 10:34:41 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 538B16B0082 for ; Fri, 31 Jan 2025 10:34:41 -0500 (EST) Received: from smtpin27.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id F3B7AA0B96 for ; Fri, 31 Jan 2025 15:34:40 +0000 (UTC) X-FDA: 83068144320.27.A48505D Received: from NAM11-BN8-obe.outbound.protection.outlook.com (mail-bn8nam11on2054.outbound.protection.outlook.com [40.107.236.54]) by imf04.hostedemail.com (Postfix) with ESMTP id E7A5D4000E for ; Fri, 31 Jan 2025 15:34:37 +0000 (UTC) Authentication-Results: imf04.hostedemail.com; dkim=pass header.d=Nvidia.com header.s=selector2 header.b=mZ9TkiSi; spf=pass (imf04.hostedemail.com: domain of ziy@nvidia.com designates 40.107.236.54 as permitted sender) smtp.mailfrom=ziy@nvidia.com; arc=pass ("microsoft.com:s=arcselector10001:i=1"); dmarc=pass (policy=reject) header.from=nvidia.com ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1738337678; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=cL5Vw3xrlFV4XDoIJes/bXmh/168ljYF8Ehc32XdDHU=; b=1qqqLb/8/Ir14lXDTGkeIl4gZkNwHvtuHEm9z50qEi6KURPgvlD5oQUJZRx9sd18aDaGIW kXRkwqKQZOo0UhQIA4YpV2LKqEQqRIVNGZOfJH5Y9rmi6++HHvpQkLNT7JZZUPXI5FRJhN 0m+QlB7OPCFiGg69Qc8pf9DNgQmQ1p8= ARC-Seal: i=2; s=arc-20220608; d=hostedemail.com; t=1738337678; a=rsa-sha256; cv=pass; b=vi4xT/azfxx/u7kM07tNIV9ACwnVuRhsWmHxEg4PTMqigyvJU7w8SDwTICecTkUukgRv6u fasCRrjtS3ZuUGYodAZxNUgWnuJKvJ/SDudXpOchCltbhBzEJ2DWagOufOcw8zypXrgBsZ XmBL6q4t61lf7QxUZlXX8dNMl6PT6eA= ARC-Authentication-Results: i=2; imf04.hostedemail.com; dkim=pass header.d=Nvidia.com header.s=selector2 header.b=mZ9TkiSi; spf=pass (imf04.hostedemail.com: domain of ziy@nvidia.com designates 40.107.236.54 as permitted sender) smtp.mailfrom=ziy@nvidia.com; arc=pass ("microsoft.com:s=arcselector10001:i=1"); dmarc=pass (policy=reject) header.from=nvidia.com ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=V/G6VU9R15RyQMXZR/7oSKiOapNRLy144sNQLphil8bWkEWF6MInbuqVhTh7TjDFGH6xUSYX9DGR0ZsrYmqDMbgvw9bDHJoUTXke4vIOaMyWQy03+0fa/NMjziXi0eeM0FfEZpOucZ/8iSYZU7q6WXVUbypvPV1Qh655369Cy0dwGkKttNJq/CqU3ytbpQ9nJuIHLsUXgdqeQgxVeXS4x3/HSRiGxNVwDMZS9pEt1DJ7qcrdVad1SksWV1SyG03U7DS+rfkTkWHfYv3OLDGy28kge5jCvZD4X4DYyctdeAqGVAtnRc0Btibd3MjwqiucnZU2np8n97vqYVX2qkDT+g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=cL5Vw3xrlFV4XDoIJes/bXmh/168ljYF8Ehc32XdDHU=; b=besVNdAwpJCl2BgVvPDZgTkClB2Yw6KdmijwvnmKz1ZnPaQIhUloStMOD9kkhZKqjbWSlhrF+UD/KDWncjsw1Q5IBuFHXYXMwynHVvKzUb/Kj0OI+UGop1kpTXO7tFt9ideYiTA4eu+UueoipdhysafmrCrWCx2R7IUSe2W75sQ8R2FNMpTtixiauJLFww29xRMBkED5r+nNGZglqDsgzlk/XuX40O/NZdx2NTHIu6R3is8D3i2JuZg8mmGgjlGF1p5+W+foy4+fd1qpsSlAFBaO0H2LZYFcztGt6IlWI0xYqFumetoS7A9pCNAvOwecd6ILD2RwgegRNkCKfCKGWg== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=nvidia.com; dmarc=pass action=none header.from=nvidia.com; dkim=pass header.d=nvidia.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=cL5Vw3xrlFV4XDoIJes/bXmh/168ljYF8Ehc32XdDHU=; b=mZ9TkiSilk3u1Ks+WSsl4W2VjGK47ZtViLtRTYC/guUvdEcAaDjIFeYfJaPfn0jK/vGUxs89Ut74KQVdVahMCM4zz2AzHyrUda93KQzNZ43tK+knlKfJO/ahV2kL+8tksUm07m7/7kjVebF32gxmAs2JZxNAYlwDkwn6eiyRdN3Cde+vLBMi+ce/HDB2fpNiOZlDT1J2rozKFuEdy5C76p9fqhuajYIYO+J5DREAjGe4gW0M7nCPkvV9NXPxb7XxlmDPd+FRv0uVzXCeYUCfB9ZNURv+IUv8/b0Pkhmi9stI0snAOkq/ZuygrmxSuNaXEA34XTWcIFDI0JLxAzjm9A== Received: from DS7PR12MB9473.namprd12.prod.outlook.com (2603:10b6:8:252::5) by DS0PR12MB8197.namprd12.prod.outlook.com (2603:10b6:8:f1::16) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.8398.22; Fri, 31 Jan 2025 15:34:34 +0000 Received: from DS7PR12MB9473.namprd12.prod.outlook.com ([fe80::5189:ecec:d84a:133a]) by DS7PR12MB9473.namprd12.prod.outlook.com ([fe80::5189:ecec:d84a:133a%6]) with mapi id 15.20.8398.020; Fri, 31 Jan 2025 15:34:34 +0000 From: Zi Yan To: Alistair Popple Cc: linux-mm@kvack.org, lsf-pc@lists.linux-foundation.org, david@redhat.com, willy@infradead.org, jhubbard@nvidia.com, jgg@nvidia.com, balbirs@nvidia.com, christian.koenig@amd.com Subject: Re: [LSF/MM/BPF TOPIC] The future of ZONE_DEVICE pages Date: Fri, 31 Jan 2025 10:34:32 -0500 X-Mailer: MailMate (2.0r6222) Message-ID: In-Reply-To: <4ciym2rrwkttnlym77ebywwn4ppesycjwm2dcoffs74eslu6uf@gvvnthaxkdoc> References: <46C1B764-16ED-4559-9A6A-C70C99033BA5@nvidia.com> <4ciym2rrwkttnlym77ebywwn4ppesycjwm2dcoffs74eslu6uf@gvvnthaxkdoc> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-ClientProxiedBy: MN0PR02CA0013.namprd02.prod.outlook.com (2603:10b6:208:530::27) To DS7PR12MB9473.namprd12.prod.outlook.com (2603:10b6:8:252::5) MIME-Version: 1.0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: DS7PR12MB9473:EE_|DS0PR12MB8197:EE_ X-MS-Office365-Filtering-Correlation-Id: b4cc2004-4179-4317-e7ff-08dd420cc3a4 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0;ARA:13230040|1800799024|376014|366016; X-Microsoft-Antispam-Message-Info: =?utf-8?B?aWhOQjYxTlJkU1ZBaFlKRWE2OFJERGdEbmVYRHp3Y2FlL04ybkxNSjg5cjNV?= =?utf-8?B?QkRORkhoSXZTbHR3b2JWSW5BK0xFeWN3Z0ExNEN4czBGM09UeS9sS09mYkM3?= =?utf-8?B?QTUvcmhSYWdwcFZ2ZExDalFVQi9Za0Rib3pjS3Z0WCtjWm1pbWt2T2x2MzUy?= =?utf-8?B?QTJXL21uNFpzSkRjVGQwSjhDNXJybTJyU3NGQytkcnVIWk53VkZuUWJxbjJo?= =?utf-8?B?UjhwTjdaTTM2MWxBNDRtTmtyUkxINU8vdldZQU42dks3bTZlVTcyeHJ5Y3Fv?= =?utf-8?B?VTRzQWV2S2hjdS9xZXVwSVh5S3UwTHFaNjFLTzZram53NmZCNml5dGRCVW5Z?= =?utf-8?B?dmhjSS9aU2kxcGFWUFViYURmNGtnTVRQV0RYMjQ0T1JpMGkwV0trL3FzVUo5?= =?utf-8?B?ZXlXc04yNkRRMjlTbnpxL3dyTFMxcmtCcEFsOXd3QzZTVXNlQ01YSWIvbm1O?= =?utf-8?B?QXo0ZjhZOGpXYnI4Q2xUeUljV2xZanIxTjQxei9aZ0hSUW9OajBCbXRhT1I1?= =?utf-8?B?VW9tLzBjbkRDNjRTM3lJRGlkVmFoeml6aXk4dGRwUUFISXZJN3FlU3o0UlVm?= =?utf-8?B?dnFkaUR0TW9CQWJ2S2cvTW9rQjIzSWlUaW9NUmwrb2JFR2M4OElsNms5MmpF?= =?utf-8?B?ald4WFFiRmRWTmtWYzRSd25HTk16UGIwcTNodERKTy9TWlRGcTRNbFVwK0Z4?= =?utf-8?B?NzV5K1U4OEI0elZDam85NmxVQVBOczM4SE5IVEhaelI0K1RFOCt3WUpCTkFz?= =?utf-8?B?cllHdHpOQjBzWDFKTnBVVEwzUkU3eTFvKytTRFhwYitFcmlpbVpEa0lHNjZy?= =?utf-8?B?ZnBjTGFDdHZDOFpCNTNVSmk2NVhPN0kwa3RiZHZranZ0SGFsNk1JME9pMHhp?= =?utf-8?B?SVZWRUlibHB4K2lpeWdWay82SVVKZHZ4UXVuR2ZWanU1NWpqR01XdmlsSk16?= =?utf-8?B?d1c1ZmlWb0tVVmRnMG43YUNjdU5zTVF2VUtPa0JDZERnN21GKzYwL3JwMG9z?= =?utf-8?B?a2NPOWt1L2pPWFFMYUdCTERCRjFTRXFIcTFqZTlnUTFuVy8vYko0VlJha3Nz?= =?utf-8?B?WDRjWUZrTmp0NjdFUTRtU2hwZEdMWkFoSHVkN1hSY2U4Y3FIU2FzRk00R3A4?= =?utf-8?B?UWx2MDEzSzNhdmNpQk5jSXRpS3JuVTljSmdkTXhXTy9FYmI3YzRpenVRTGRa?= =?utf-8?B?UWNiMnQzOUF4VmFuTFIxUForWDNqZEkwejlTbHJDZkdHUGJ2YmFuTkxObnpZ?= =?utf-8?B?eU9HWnEyU0NoS2VNb3Iwc2l5QnIrQUZXM3V3eG15Y3YveUJ6cWZWZkZ0cTB2?= =?utf-8?B?Yjg1cmZnaHhlTlVjbVdidkpxaTB5SGpiMVRkN1dRWW1rTEs1Q0pxam03bjNL?= =?utf-8?B?NjRBVHViZ2Zzb1VScEluZEN1SDVlYmxpUTh4QVlhQnozWDM1b2N5Zkk0YzY0?= =?utf-8?B?OEtwSTkxMnQxRkhQR3o5dVd5VFhvUjdIM2R5Q3lQd0liMjNjanpBeHpUR2dr?= =?utf-8?B?UmI5RXI5d0pkODhUb2FLTjdmTlU2TENtblpEMTQ2RVlleGhwZEZkb2M3YUMw?= =?utf-8?B?cDQ2R3oveGcxdk15T1Q4eG1yekgyL21kbkc2TFNWSU91UUFGQy80QllhamxG?= =?utf-8?B?TXAvMVBMdklCYStRMmlYZXlBdk1BcGt3SHJaeFh0a0VXY0t3YU5LaU02dllo?= =?utf-8?B?QytSOWZ1b3QxREVja09pOU02ckhaZzNKZDNkYk4wN2UrWjB5cFJRSklKUUNz?= =?utf-8?B?RDZHM0dOY1hxTUUyUTdjdk5ydkROQldJZ2ZyUW1kSFNLWEFVQkJscVJ2Kyti?= =?utf-8?B?NnovUG5iNXg2dzFwOHVYQU9jL1Q2WFJlVStNQTFsSFdyZndhMnhHQ1BwNUNT?= =?utf-8?Q?+Mt+jxa1X42CB?= X-Forefront-Antispam-Report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:DS7PR12MB9473.namprd12.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230040)(1800799024)(376014)(366016);DIR:OUT;SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: =?utf-8?B?eURCNXd0SXA4S2h4a0JoTDNzRS8rbkNwdVB6aHMxeEt3ZEN0YzFCMDZ3VXE2?= =?utf-8?B?TGtSRUp2TGtTbXkxcHhrVDM2UzdyREx1ZjhqYnJVcURORXVNNVZxd21KSHZn?= =?utf-8?B?NCtNWkwwTzN0Z1JGc2pONnBESXUzTzNaYVR5aGp2dTNSRWpWN0lPY2NjbHJV?= =?utf-8?B?aUFHZEsrdWppcVZOYlVXd29ISE52NDNXMUpRRnFMeHowOVA4VEhnMjFyZkox?= =?utf-8?B?MWI5eGd1STMrOVo1bEwzTFBVZE9VNUkrcXRSd1QxNFFtQzNTRnVQK1RmbEZQ?= =?utf-8?B?NmlPM1RHQk16cG85b3VlZWpkUTF3NStOZUdZTzN0U2R0RFZxdEZiOGVVVUto?= =?utf-8?B?a1VPekxrOWdRd21VWlNrNWdyQWd0Mzk1cUJ6TEd0aUd5b0c5S3FQZWJKMXA2?= =?utf-8?B?cDhJTjFzZm84ZUorM1NSWm9KMy8rRTFOMnRiVWpEVzBkUzVtaURMOUFGV1ZG?= =?utf-8?B?NnYvcnJ0eUtBUGNxcVZEM1RpMVBsSXBOVHNiT1Z1Y3o5K0hnU05YRkhBUkV4?= =?utf-8?B?QmxTY1RTL2FEaTFubTlibzFNdEpUak1HVEcrQjN1d2M5bDFYSDlQTmNSc3lW?= =?utf-8?B?RC9xNVFyMC8yd2RaNUlZdytnU0ZwL1NoM0JzUmxvMDJRRUIwMDhrd3RUS0E2?= =?utf-8?B?aVgzaWpuM0I0TENzNUk4WXh6bXF0Z242bTVCRVFHY04xUFdlSVh0Q1oxZEov?= =?utf-8?B?WkhzbEMxU0phNmwzSysxVlh5Y203TGprYWdUL0pIbDJHTWlVM3pZMmtkd0lI?= =?utf-8?B?R0dkMndsWUhqWVhZN2dCa0tDazU0Znc3Nk81TE5DTXozNE1xZTRkQkZ5RG85?= =?utf-8?B?U3VrWUxlanp6UUEydFpiTGNkUzM4cFEvdE9DeXdXc01BMVhsRVU3OFlKR2VU?= =?utf-8?B?OGtYQXF6TDhNUGhudlVFck1WVGk2c3JqUWJQTEV1RVJsSTRCQy96WC83NGNp?= =?utf-8?B?WnhYV1hkVDAvVkFnbjVxdDNyb1JDZ0VVbWprWlUzRXVzNk9wS3NjdjZPMVhU?= =?utf-8?B?dXBMTGo3ZUkxWFdzd2JCVTYyOUZscS92T0drNG1PeUZRMXdjTTlWQThoaWFY?= =?utf-8?B?SFBZZy9GT3BPTmFyUHpxWDdMaE8wOG9iMlh5RFJxa0p4aVlBOElJTHgvb0hN?= =?utf-8?B?cEprMmNqZGErTlZwa29HajR1dlZwc0JCcTZ4WC9tSTFVcDFZcFE0cnh4OWkz?= =?utf-8?B?T3BVSktUeE5DSlRJWVkwdVRucVJsbCthUys0WHl0aUVyV3ZHUlk4SzBMWFpN?= =?utf-8?B?aVhlNFBSWS9qeXdkczJHR3RRTzlrZ0ZneGxSYWQwWGdGZDdvcm0rbDdxV0Ur?= =?utf-8?B?OWVxd04zSy80OFlHRmRLRThXelU4N1AybVlwTEU5VklOVUY1Yy9OREJWcTgy?= =?utf-8?B?ZU9TU2NjVjAwdWNaekQ4SHB4L2JxRnRBaWx3d3hCOE1ZaWNXaTdFeUR5cDNB?= =?utf-8?B?ZU5ZdTJkRVNlZnY1SWxOdENYTTl2dGFvV2ZFZGhrdWVQWlIwZGFpc2FvcGxy?= =?utf-8?B?RHVMSHIyOFkzODVJVTJtdFYwWDF0RU9jRnpuYVl5d05vSlBwSWdRRVJ1N2FC?= =?utf-8?B?Sml0MDRjYUtyNHZuQ0RLRW5pZ1kvREFIb29uSkxDL01KZ096TVlnWG1Sb0Vi?= =?utf-8?B?T2pDSzNIaFh5Y01TNitkNDFuYVFoR2ZCVGhrbVBCUjlKR1FUMC9ablcwcDR3?= =?utf-8?B?NkVIY09lRCtrMFkrcCtMVHZhbXRENmYveFVwTk1GQWJYVUtGanRLcFd1Y2pG?= =?utf-8?B?YkFUMTE4b2RZNmQ0dEcrNFNDVjNjelVraTNFSlQwaCtJbzBOZGwzWXJpUFhK?= =?utf-8?B?R0hRWDhxaDRsVnQ3MlR0RTh0Z3dMSWMwRU9oNCtJOFY5VFFyRVhKN3QyZTVM?= =?utf-8?B?ckpMWTZQNDhrTHRLYXhMKzJUT2Z3Q1Q1bHdSVWdKRzkxc3NMbjk2SVlLcjdU?= =?utf-8?B?RFp4UDU2cDN0aEQ4dW9vQ2VXVzBuWDhMV0FhQ0graFV2S0lFQm5tTVhvaXo5?= =?utf-8?B?K2VQb0pzdmxVSDFIeEluR08wQW1UNWJEbW5zck00T2FDc2dVVGRJbXBzWStq?= =?utf-8?B?bEtCNUxMV2hwdzRxUklPQ3A4Sy8wWXYrbUR6T1hhaGZFS1MwdVhJTEJ1cStK?= =?utf-8?Q?oDSUe+w4dPgCBDeY6IsjEccKH?= X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-Network-Message-Id: b4cc2004-4179-4317-e7ff-08dd420cc3a4 X-MS-Exchange-CrossTenant-AuthSource: DS7PR12MB9473.namprd12.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 31 Jan 2025 15:34:34.7615 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: rqlmLL8hqsVmayL0weuoDP/s7J1tgrIhP2F3Ax+hcQTFYmUrxYL8kAE0dyJh8S1J X-MS-Exchange-Transport-CrossTenantHeadersStamped: DS0PR12MB8197 X-Rspamd-Queue-Id: E7A5D4000E X-Stat-Signature: 355rzqydkhs6d3ogdr699zusk4ufsdbi X-Rspam-User: X-Rspamd-Server: rspam12 X-HE-Tag: 1738337677-150965 X-HE-Meta: U2FsdGVkX18Ft3AnvKZdPLa9yMLyYBWoJ919xaWGO5VL3Q7CslVCoRWKXz0wmSVPFwc8JcQDPIGh81nX0/HcCTa54nyJ6jIrWLoWfYnRe5394EhKQbHynp9CUbU706iHPYnbTctZh+vcBWaJQfRxRsL0T84SFlunivLKByncntHun6IsxSGEpMETCYXsygLOOywghq4hyD97LV38MCPpTqPDobCxZHqK2cwAbDx7K12OokjztpsPqKEzUU8DrpTL1lr9OPiWSvYHWII4hk/gs8pNzpP9yhZAydzsxaOABdUNSTntgSY+1MRUeG8feveBt4DKzT45XFpOUIOxcJfrjKOGGaZ2DMl8gwYubNEHqR2fH8UhMvAntoeyF+J3ZhoY1JivrxMS/4BC6nPN/V9NjqqLB7HSOeg2/37C15FpBTEAFMACKtSVm/N1Yl3Mup3t3q1vxQWQiM4DkhNZtJgdaxY4rdJGG/ybH7Vlg8dIR+tQJRqFdPNgO0Dbx9lZiRejDNpk+Fb3sJEA80GSsTsdjUu1UGO91tDH2+IAq29Y3XgHWxgwhRiiZOXefIlT1eDzKIpK0Np5JAEi62mi1it9gk2t615HDtB8mMEW8ryOZfb/CNlxEZ2e97jKDmPimud079D0fNPBfCoF+Zastm4ip4jDjIHsYGlPWyrML/3RHizG5VR+Joxt1f97CGslDhtiOWwCeSsMjw+ZnTjokd17RWQ7wx0sXKC3WkYjYW9byhaxQmgjghy5lej1gGGNbSTLupOaKRC/8QXz2BYlgX1QariYSth3SBLN23P8v3rnLbkIVSzyYQx5dpqbMr6mQGn6gQ/DbmggVIQwjRG5eRfCf0RMrDp411MZ4EgSpgQhjMsC7FU3J8F2XO2vRFCjWNz5GmMJU3nmwm/EKNibyUo5eKIwPscBIOTDyeatLuLyvcTJ3RYXkBr1wZ3nq6HMIAxVDKQkyBEjimSaRQPxE0j W7lqSxpP VabkcSucHgNomsopyrEd8hVUqAivooL4vSUQpPqr44wCFmlufLK/Gd8uDVMIeNqzPVBorbg4Vvv2tHzEh2ncarUW/8xMB4unXaUruLpc2mKH1UeljhnSBK7tUM/ld5fIS9HTcy/IwH1KSiAmes7EBREz+XH4C1eX07UArWMR8zKLO4OOAnNRBA9PWRvZIUUrnGfRdG5UQdZEMPkvAyaoyqUpCcQLXyEK3lzZbxBArki1e6l+3Qqsr3y6kWckbqn2r1BfFZ99b/ikjWUCKNfBDgeQHXp0KCOkPJLKFk/Lz9buNIntXUs1QFhBWumpN1v7JuH1OGUL/fwQp8a95RacSFdLMCGAtYCdbkN6okitwLOx3p0T73B66yWSkYzHZlRJqle9/gqLHXA3alNUg+h0on5iXerRLRlMqGISYiRuZlLUZu52IaWm9c5gCaWJWJaZE9Tpo4S7BkafYEGeEQJScSeODHA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 31 Jan 2025, at 0:50, Alistair Popple wrote: > On Thu, Jan 30, 2025 at 10:58:22PM -0500, Zi Yan wrote: >> On 30 Jan 2025, at 21:59, Alistair Popple wrote: >> >>> I have a few topics that I would like to discuss around ZONE_DEVICE pag= es >>> and their current and future usage in the kernel. Generally these pages= are >>> used to represent various forms of device memory (PCIe BAR space, coher= ent >>> accelerator memory, persistent memory, unaddressable device memory). Al= l >>> of these require special treatment by the core MM so many features must= be >>> implemented specifically for ZONE_DEVICE pages. >>> >>> I would like to get feedback on several ideas I've had for a while: >>> >>> Large page migration for ZONE_DEVICE pages >>> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D >>> >>> Currently large ZONE_DEVICE pages only exist for persistent memory use = cases >>> (DAX, FS DAX). This involves a special reference counting scheme which = I hope to >>> have fixed[1] by the time of the LSF/MM/BPF. Fixing this allows for oth= er higher >>> order ZONE_DEVICE folios. >>> >>> Specifically I would like to introduce the possiblity of migrating larg= e CPU >>> folios to unaddressable (DEVICE_PRIVATE) or coherent (DEVICE_COHERENT) = memory. >>> The current interfaces (migrate_vma) don't allow that as they require a= ll folios >>> to be split. >>> >>> Some of the issues are: >>> >>> 1. What should the interface look like? >>> >>> These are non-lru pages, so likely there is overlap with "non-lru page = migration >>> in a memdesc world"[2] >> >> It seems to me that unaddressable (DEVICE_PRIVATE) and coherent (DEVICE_= COHERENT) >> should be treated differently, since CPU cannot access the former but ca= n access >> the latter. Am I getting it right? > > In some ways there are similar (they are non-LRU pages, core-MM doesn't i= n > general touch them for eg. reclaim, etc) but as you say they are also dif= ferent > in that the can be accessed directly from the CPU. > > The key thing they have in common though is they only get mapped into use= rspace > via a device-driver explicitly migrating them there, hence why I have inc= luded > them here. > >>> >>> 2. How do we allow merging/splitting of pages during migration? >>> >>> This is neccessary because when migrating back from device memory there= may not >>> be enough large CPU pages available. >> >> It is similar to THP swap out and swap in, we just swap out a whole THP >> but swap in individual base pages. But there is a discussion on large fo= lio swapin[1] >> might change it. >> >> [1] https://lore.kernel.org/linux-mm/58716200-fd10-4487-aed3-607a10e9fdd= 0@gmail.com/ >> >>> >>> 3. Any other issues? >> >> Once a large folio is migrated to device, when CPU wants to access the d= ata, even >> if there is enough memory in CPU memory, we might not want to migrate ba= ck the >> entire large folio, since maybe only a base page is shared between CPU a= nd the device. >> Bouncing a large folio for data shared within a base page would be waste= ful. > > Indeed. This bouncing normally happens via a migrate_to_ram() callback so= I was > thinking this would be one instance where a driver might want to split a = page > when migrating back with eg. migrate_vma_*(). > >> I think about doing something like PCIe atomic from a device. Does it ma= ke sense? > > I'm not sure I follow where exactly PCIe atomics fit in here? If a page h= as been > migrated to a GPU we wouldn't need PCIe atomics. Or are you saying avoidi= ng PCIe > atomics might be another reason a page might need to be split? (ie. CPU i= s doing > atomic access to one subpage, GPU to another) Oh, I got PCIe atomics wrong. I thought migration is needed even for PCIe atomics. Forget about my comment on PCIe atomics. > >>> >>> [1] - https://lore.kernel.org/linux-mm/cover.11189864684e31260d1408779f= ac9db80122047b.1736488799.git-series.apopple@nvidia.com/ >>> [2] - https://lore.kernel.org/linux-mm/2612ac8a-d0a9-452b-a53d-75ffc616= 6224@redhat.com/ >>> >>> File-backed DEVICE_PRIVATE/COHERENT pages >>> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D >>> >>> Currently DEVICE_PRVIATE and DEVICE_COHERENT pages are only supported f= or >>> private anonymous memory. This prevents devices from having local acces= s to >>> shared or file-backed mappings instead relying on remote DMA access whi= ch limits >>> performance. >>> >>> I have been prototyping allowing ZONE_DEVICE pages in the page cache wi= th >>> a callback when the CPU requires access. This approach seems promising = and >>> relatively straight-forward but I would like some early feedback on eit= her this >>> or alternate approaches that I should investigate. >>> >>> Combining P2PDMA and DEVICE_PRIVATE pages >>> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D >>> >>> Currently device memory that cannot be directly accessed via the CPU ca= n be >>> represented by DEVICE_PRIVATE pages allowing it to be mapped and treate= d like >>> a normal virtual page by userpsace. Many devices also support accessing= device >>> memory directly from the CPU via a PCIe BAR. >>> >>> This access requires a P2PDMA page, meaning there are potentially two p= ages >>> tracking the same piece of physical memory. This not only seems wastefu= l but >>> fraught - for example device drivers need to keep page lifetimes in syn= c. I >>> would like to discuss ways of solving this. >>> >>> DEVICE_PRIVATE pages, the linear map and the memdesc world >>> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D >>> >>> DEVICE_PRIVATE pages currently reside in the linear map such that pfn_t= o_page() >>> and page_to_pfn() work "as expected". However this implies a contiguous= range >>> of unused physical addresses need to be both available and allocated fo= r device >>> memory. This isn't always available, particularly on ARM[1] where the v= memmap >>> region may not be large enough to accomodate the amount of device memor= y. >>> >>> However it occurs to me that (almost?) all code paths that deal with >>> DEVICE_PRIVATE pages are already aware of this - in the case of page_to= _pfn() >>> the page can be directly queried with is_device_private_page() and in t= he case >>> of pfn_to_page() the pfn has (almost?) always been obtained from a spec= ial swap >>> entry indicating such. >>> >>> So does page_to_pfn()/pfn_to_page() really need to work for DEIVCE_PRIV= ATE >>> pages? If not could we allocate the struct pages in a vmalloc array ins= tead? Do >>> we even need ZONE_DEIVCE pages/folios in a memdesc world? >> >> It occurs to me as well when I am reading your migration proposal above. >> struct page is not used for DEVICE_PRIVATE, maybe it is OK to get rid of= it. >> How about DEVICE_COHERENT? Is its struct page used currently? I see AMD = kfd >> driver is using DEVICE_COHERENT (Christian K=C3=B6nig cc'd). > > I'm not sure removing struct page for DEVICE_COHERENT would be so straigh= t > forward. Unlike DEVICE_PRIVATE pages these are mapped by normal present > PTEs so we can't rely on having a special PTE to figure out which variant= of > pfn_to_{page|memdesc|thing}() to call. > > On the other hand this is real memory in the physical address space, and = so > should probably be covered by the linear map anyway and have their own re= served > region of physical address space. This is unlike DEVICE_PRIVATE entries w= hich > effectively need to steal some physical address space. Got it. Like you said above, DEVICE_PRIVATE and DEVICE_COHERENT are both no= n-lru pages, but only DEVICE_COHERENT can be accessed by CPU. We probably want to categorize them differently based on DavidH=E2=80=99s email[1]: DEVICE_PRIVATE: non-folio migration DEVICE_COHERENT: non-LRU folio migration [1] https://lore.kernel.org/linux-mm/bb0f813e-7c1b-4257-baa5-5afe18be8552@r= edhat.com/ Best Regards, Yan, Zi