From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B8F27C0218A for ; Fri, 31 Jan 2025 02:59:22 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C518E2800B5; Thu, 30 Jan 2025 21:59:21 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id C00F22800A5; Thu, 30 Jan 2025 21:59:21 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A531C2800B5; Thu, 30 Jan 2025 21:59:21 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 832AC2800A5 for ; Thu, 30 Jan 2025 21:59:21 -0500 (EST) Received: from smtpin04.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 32E3EC0654 for ; Fri, 31 Jan 2025 02:59:21 +0000 (UTC) X-FDA: 83066240922.04.DFA67B6 Received: from NAM10-BN7-obe.outbound.protection.outlook.com (mail-bn7nam10on2063.outbound.protection.outlook.com [40.107.92.63]) by imf06.hostedemail.com (Postfix) with ESMTP id 77F8C180007 for ; Fri, 31 Jan 2025 02:59:18 +0000 (UTC) Authentication-Results: imf06.hostedemail.com; dkim=pass header.d=Nvidia.com header.s=selector2 header.b=eMBZLuyC; spf=pass (imf06.hostedemail.com: domain of apopple@nvidia.com designates 40.107.92.63 as permitted sender) smtp.mailfrom=apopple@nvidia.com; dmarc=pass (policy=reject) header.from=nvidia.com; arc=pass ("microsoft.com:s=arcselector10001:i=1") ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1738292358; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding:in-reply-to: references:dkim-signature; bh=FyyrcSKpG+2TJrqSA6jXY1oP6AzvPtVU3Vlf4taxPok=; b=qJF7i5ROKlC4dTB7ZwfJ+YLWXizDc9u7NPQ0aCZhAayQfsLXbnNpv6PL8ZwVQH0gXZVn3C ptqsNrgGV38OlxMFtiZScgjKsYG9VMZQKo8lIK37T5z/PxDS89Q5YdxIEM5xFvh+dT3QYE xdBg42pXdqzqSAB2bxswfI8k0ifYOQU= ARC-Authentication-Results: i=2; imf06.hostedemail.com; dkim=pass header.d=Nvidia.com header.s=selector2 header.b=eMBZLuyC; spf=pass (imf06.hostedemail.com: domain of apopple@nvidia.com designates 40.107.92.63 as permitted sender) smtp.mailfrom=apopple@nvidia.com; dmarc=pass (policy=reject) header.from=nvidia.com; arc=pass ("microsoft.com:s=arcselector10001:i=1") ARC-Seal: i=2; s=arc-20220608; d=hostedemail.com; t=1738292358; a=rsa-sha256; cv=pass; b=OtSsUC9Khb48u0wB0FTwiu91mwGKWA/r+Q0Tl66TxxzRrJV8EtFVqygdtheg3SgWeqYLoJ +/VAHZ9T4L05/TlIKnaenNqh2ETY/eOgM5wS711WIawdZXFgsiaOH5eqJ+8uAU2oHPs2Zq gS+5/gCevBwex5GWaxqXnCoNEMC+j9k= ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=F0HHUKqQ7rWEPuiv+9MFbmCYOEavcBbiriVEqE4oyOmNKvk9XGsJcPkceMiswhV1z9T0QjpMhVqnlVgITj7pYDEKTba2nxKx5ldt30gFsDAMZZLAObYP70ak7d/i0sl/nXBl/xv6CKO4uMkUXfj8iwUuwMwHSVZKzh0gT47oIk0IriNnH4uEqYGoG+ITUDhViG/HUUA92FaBk5RkvrwSeTxzbPQJAV/nncI3Wxi1iit21VVZK8xnXP79vIB1F21Xwvri2LAQnL5YMvVASpY+2oU4yFL5eCGIcVNClB8r3zR0nluHgBBqgbnCufQFgcSWvYV/2NvZeeuwqM9TgwcBQQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=FyyrcSKpG+2TJrqSA6jXY1oP6AzvPtVU3Vlf4taxPok=; b=bl2R4Y7IYTL/S2/t8IbDzB0/GPXLMWxnw845lgdEEXJCYMUtoLKn3ZX0D0cBUIphqCGJ/lGG7BNhvnQ5bDSNWRvXSgniXzqw4I1sHZOseu6eMrOaOZ8BA02S5P+cQGp+7hT0CmWKrbwgndLW4XQd9AfgfaHmPDEaoYh878LQMB3XAhasXDhOmnaVepknaBIp3DP/G23+uEB9E3G/RFRTPtQxeLdj/GrBvknoEVN2JtagoIZol1O1nXRNtFPYb+l1MfbE1y8Cdro1ZfpaC0Rf+thgGd7igI+o6MwGhqBsINbRdI0UJnwrqAXgRdXRGK0XvP7DO2p6UPIQX8xJcwM6eA== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=nvidia.com; dmarc=pass action=none header.from=nvidia.com; dkim=pass header.d=nvidia.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=FyyrcSKpG+2TJrqSA6jXY1oP6AzvPtVU3Vlf4taxPok=; b=eMBZLuyCXsdB1DVTQ/Hea9UQqAUWEMGr93iUyx7wZfqFFnvz5aiJoBr+ZuaNd7gKOkAsqohRpIj7zwkRJbEkmxayOVSG2b+lyjFDiboaj0FDerEaXQMFOg5Jyl6PasjBeEWbNBrXj/PHr6+TEVt3weeNcN/OgMlF0v4VsPEGMu+ryp54Olmh/WGeGuh9GyKRZxSH29MsA7CoqW+2z0qCNy3NoERG2/zsFLty1WePSYa9Y8cFSNVjqQ+R2Zvg5LwW0Hb125wHpaIEGRO1IiLmVJ9lqSBP/d+pSYNUwVfD2KwMn2tKxRFK9A7VFp56hBUlDVeV0f3nZ6QzAu6PRrxTNQ== Received: from DS0PR12MB7726.namprd12.prod.outlook.com (2603:10b6:8:130::6) by LV3PR12MB9267.namprd12.prod.outlook.com (2603:10b6:408:211::9) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.8398.17; Fri, 31 Jan 2025 02:59:15 +0000 Received: from DS0PR12MB7726.namprd12.prod.outlook.com ([fe80::953f:2f80:90c5:67fe]) by DS0PR12MB7726.namprd12.prod.outlook.com ([fe80::953f:2f80:90c5:67fe%7]) with mapi id 15.20.8398.017; Fri, 31 Jan 2025 02:59:15 +0000 Date: Fri, 31 Jan 2025 13:59:09 +1100 From: Alistair Popple To: linux-mm@kvack.org Cc: lsf-pc@lists.linux-foundation.org, david@redhat.com, willy@infradead.org, ziy@nvidia.com, jhubbard@nvidia.com, jgg@nvidia.com, balbirs@nvidia.com Subject: [LSF/MM/BPF TOPIC] The future of ZONE_DEVICE pages Message-ID: Content-Type: text/plain; charset=us-ascii Content-Disposition: inline X-ClientProxiedBy: SY5P300CA0036.AUSP300.PROD.OUTLOOK.COM (2603:10c6:10:1fd::8) To DS0PR12MB7726.namprd12.prod.outlook.com (2603:10b6:8:130::6) MIME-Version: 1.0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: DS0PR12MB7726:EE_|LV3PR12MB9267:EE_ X-MS-Office365-Filtering-Correlation-Id: df5ba74c-6bfb-4121-e935-08dd41a33f05 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0;ARA:13230040|1800799024|376014|366016; X-Microsoft-Antispam-Message-Info: =?us-ascii?Q?rk+RZCZbRoISMs3/3YpZK1TcBPZU9xsyh9go/tgRO49QeWNvCVzNSQzLF+GI?= =?us-ascii?Q?LGZQj29YAveeQAF2//yssSOKYIKj5CHxHPX6Q3b51NXYJ4WgZzvlLuaxsv/e?= =?us-ascii?Q?COBK7+qKeGSPgkA3OYeVsNw4jVXpRnO51xCjmqf+x+vB2oDmMLHyxYKm8OwX?= =?us-ascii?Q?noOlMgtDun+h2DzIXUUmYIfdF+8u77gHTSSn3aeyENw7Cs0YlwqD4ezWB0pW?= =?us-ascii?Q?MgkyHb70uPhxG/AaUCwPA50lIOx/GRdZqWcCi+emjXozblQMhVoNRoY24nOU?= =?us-ascii?Q?RiWFEv+ttj+JkGOpDsF1Ne0C5tEr2pgsft9HQPZKnVvqfzMZ2xXrjyTJAbrC?= =?us-ascii?Q?A6Btghf6KPpbboyLctdsVg/6G9ZHHyuozCRPytkev318221tj/m1TPoj5DW2?= =?us-ascii?Q?Aw00Z9pmgXPTkDuSdi2pAXVf7tE+2rlhdMEY3koa2CQZIVVvZ5H5VgsXeYgw?= =?us-ascii?Q?Zf2k/9FG0mUs+8YBiIFmE87czwRJj2UwAO/tsgvvw8mZnAeUGJFwdY9NDPRb?= =?us-ascii?Q?GouFLgSKkhQ7JZZ210An/lmXiZr6q1Mtx+LF08qUsEBdJFvLljgPWhqMFV0b?= =?us-ascii?Q?wM3TTzXUd5r+CSAxH2zAV6DYrEfHPvzY43edNnbhSRaTawlEU2oL3E0GxY8d?= =?us-ascii?Q?CPP2bI7BYbbcJyvydZ7JdDEEjhkvEgQcuWRCBXGc2v+WIdiXjn8dZScIlhC4?= =?us-ascii?Q?Nu00n7UaRf/BjD9aHcdppdhXcVm0oyA148yR+PJOV9shOnxwCCMZyRRbCH3N?= =?us-ascii?Q?AzFKCw/pnkckvw0e0YkqQB1buUY9ueACESiv/JPJBA5yff7HZaaSWqzOPxkA?= =?us-ascii?Q?eje1Ls70xgLXDF0zNOvDxS7rSOiCVlJZmlNn5EH7EYqftUSuPQ1tliEIKiH9?= =?us-ascii?Q?bIY7lAnUlM3hiVJv/R2vhtbezZUavySc8OTAULNBrKYtxBgT1pk4gd5zf5tP?= =?us-ascii?Q?qFe16EPzDMSswTxo7KQRA+8DW1caKo+hu1KpB6QRjYWOsT+cd3mXZEdoiP50?= =?us-ascii?Q?oi5MDfL7UlWqRRstx/mU1efuwGsnX2rrslm5lU2xdZkG+jSaMGBP/NUM6DyJ?= =?us-ascii?Q?1bjRJBGieXuhH2ya0B2izK8v7borA4FCH1ynbGzAjQocLZbloqQWRACO9T1z?= =?us-ascii?Q?7aKpV+pO71ow6vxgGKBlblDYWqR44VY1bvzLfJQZGfOrfcd5ZpfF06Xyt085?= =?us-ascii?Q?hZD98RkIN9Ybt6VD1EUScfbU5BCQX+wokF25g+mP6VfAVorFLs6ZvwXD7a7b?= =?us-ascii?Q?iVD64dCWSFWF7tmMyN1CPm/Eatmx99ktX0rHrgmK+wtU0+3tJa7wveqz1Con?= =?us-ascii?Q?XQfdVgjXQUH54n/BtafBlbBSVMSiP3L4Le1JazsI+lJnSv4hqFGd72/CLGjB?= =?us-ascii?Q?XzWdzDrjVugZ9hfnoOGAZ2gxvNUu?= X-Forefront-Antispam-Report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:DS0PR12MB7726.namprd12.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230040)(1800799024)(376014)(366016);DIR:OUT;SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: =?us-ascii?Q?GG3UacMLE62QMb06+Ij4pTOR6V1fb2JBeEX5vFzngWune4zxj6psB3Nyb61F?= =?us-ascii?Q?bQk+0FFrwzDPryjnDRRK5LNpQfKRRE0U8DTcO+gMiIohVgwNmZB0VWkXHEMP?= =?us-ascii?Q?JbGNu4GfNwW7PQ7EnRnWsc5CFLyLX4i+/Ob++8rWubyLnFYm7KmiqTsdgAuU?= =?us-ascii?Q?NH0PT4Jdt0faOrTsL9jSRcWG8cskKIwLxIXtr9dPdnJiCeY5EfnblNUhuj+C?= =?us-ascii?Q?MvOsu11Vr/VSCC+/AkY4WPowQr3JQWWqBwq0Pah3IJK0ujSnEaP1HD7vXb6d?= =?us-ascii?Q?5RcH9kz/n9aDd80ZR6hSBQCijj0zlVrkkeORsdHNkrNiN0FQB20GuaOQbYKf?= =?us-ascii?Q?rZc/oyoTgQbxnIcLAHMk++cBU1XCgJ0B5kyADxMK2zzX2CxmAZ2GIxlI3hqM?= =?us-ascii?Q?MEBSgzm1VwOmiulXccQ8DxVsYNsKamjo1Xy24iP417HNSD1lnxFZAF44dUAH?= =?us-ascii?Q?4kBejwVTlOJZqhWXQTLGfLmrz9zQJRVi0SM8qLXlnuNpoVRZyEbzkTHnu/eH?= =?us-ascii?Q?LBCSVMe4OOz7rjtjcdGBEOXvAnM5QsWfQLfps1xOaFt0rQH8XWUwwkV1a0ci?= =?us-ascii?Q?kjvaHqYhnEvfTiZPklawfhuu11ad2YgeKLhS6PvEBQ3MjZAeuNs+LwBQC2Og?= =?us-ascii?Q?i6MjfITnZSO0q5PT7T2AsBl3H0gA/4NBRIMuKeEJajUUmPWwRKMeH7x1Np0a?= =?us-ascii?Q?j0ne+0carTrdB+6dmYnOpvK/lQV/kU9yfRJYHYI5xdSvvMLWdDx8yMT1T9YV?= =?us-ascii?Q?JeIWVTc/yZKVpTklslHxfOxHLkzcZZNdheI3jgyLYxEHejYdOw5Cf4r7AYJv?= =?us-ascii?Q?2K4OSGZUDR4vRYV5zIIN2KtDjkl8gWgyy6A4i76LuVNiw1QJQoFZJaKzQVwo?= =?us-ascii?Q?OrHjduteNq+glKRP+ZC1c0TZVTaQIU1uvem6pxxUBzZvsj5lai4BYMnEDIzf?= =?us-ascii?Q?9NrT2ctBrBSuGoRpZz2RaQOT00Q+oGM0ci+2TlKG98GeUxPqzNO6+CEP05Vf?= =?us-ascii?Q?3tFVn0EcgDJMXwLPScMqv2qtq80Mmg5LuCPHJwk5Xb+P+Dudo2EdzmpVpLSF?= =?us-ascii?Q?/NG7rjHzp5t6ajUnnqIa0/X5CKSPhzDyO0idJ7Lxhw5UEmGdHTK881CAUEqy?= =?us-ascii?Q?ozGfXEiiU5GbWDA6Wi3VfxXMDm0uPUx5Ukg63Sdq4M3/hMDSbP3oxuxLaNYA?= =?us-ascii?Q?Rrl4FvauKn98Q4NV+/siwSq0rdTxqUH0fFBQQq2Y+utFAJwPJTIRIQbdXR3G?= =?us-ascii?Q?wOyfw09EudQE0t+oEO7J0ofsT73EyHJicmrWh+5eCjd/EfNAzx8eFCgamtt1?= =?us-ascii?Q?EUcH5MADnLh8iDqedtFAl3sUm2jDKXoHNojs6eTGR72POcDLvPGOzj0nN6ER?= =?us-ascii?Q?esM4gxVyTMJoTCRg/hXPKAAjYgcfxeuZvBNAKtMEVwdtUddQhHO2QAY/aKYM?= =?us-ascii?Q?W1WdGhZY6JROjgkt2fGB9VHCuhHRAH0G5CT03CZltLhdDNSVJrxp0e2dpmjx?= =?us-ascii?Q?HKHNeSgmm6hrsAnBBSkYfop3WjvmUFtYQIOmHZerF6PMfmhCqfAJznSc/J35?= =?us-ascii?Q?+qyilhn8j0ooiF9tEudNZt6UOHJ5rdi5Za+OhwF3?= X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-Network-Message-Id: df5ba74c-6bfb-4121-e935-08dd41a33f05 X-MS-Exchange-CrossTenant-AuthSource: DS0PR12MB7726.namprd12.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 31 Jan 2025 02:59:15.2775 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: n2AvULaRr0iMNQwnGe7Ck5oST4R0nV4uzVxi37Zo+7d9C4PdpZPaZ+ByZJybWvWrvzg3+IYKaFT4krSnUPmsjw== X-MS-Exchange-Transport-CrossTenantHeadersStamped: LV3PR12MB9267 X-Rspam-User: X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: 77F8C180007 X-Stat-Signature: 336413cc5powgd5wkif5hq6troc9wuhp X-HE-Tag: 1738292358-867255 X-HE-Meta: U2FsdGVkX19OTEsThtsHqedFSRsbiyIVNpHV/bVFDTjoEt1mPq/wmhJ2hy08TDYwqT2x6poFAdUyMn9lpGGFnJcuJ733/d6rg80cAEt0T3eK+Z5jNNbmTJ4NWDz9G7tGlHqvzRAY8iaaVJbQ4i8v0mv7TAuvG/Ib5MeFMsq2FDfYGSKvhSCSaft4wVYxmjcoGm9smcB0sPiSQkxRpaWo34h0DzXhGqjV1aV/4LgjhBeSPtuylwnRogW3koR1tCP/EB6lknOhTaSahjTKhu6FShyUYhonyh+zxSN9vvi/Fktyy/RkJ03DXIdFGsbEVOxnQ0K/WUEzpGK4A3PWMH3W4Vn9z0gh4paIlDDisvq88IjN++mk+aa25wIQkImdFmN154qc2jDELR8saP65wYYytKkKnJlk97WMztwQsf44aRnkPiNt19pKy+7j+MXQupjekdE57GTlX06RUT27vS5mOgBJfGPy0yXoEYLEyagF0Lc7eKNFu91l/GHxKma8HTDEMteuIf+vdF/FXNjU9dJYtKVUYWej8Dzo2IVLeVfUheXhg5pCdMjrBBQR4e8KuwrtFcoA2DEUx9SzxAeKx4dQzjJ6ge5WZSLzAy5n8NX7GpnTYudCvGu1PhSI65zb70sJ2HAHF3B951rEVL+80j20sZjkCa3ghDa4Xj26sYLADKC8O21ZA5NEdRvrQ5XJFTrTnUKcGcBToLwnOW3niSABfwbkY9Y/fCJlscx2+g/wyFNEZdsajk5e20U27O2hgYIDnh87i4FdDJY1BE6EZdj/oAwZ7znKkCZ/4b4h6/1KVB9hzIVGJFD+vDM4I52IATCfncQquKjgnF1SlsOejrEwDYaD5pRnwhG3vzGffeY3negr1i7A653jzobg9YXB1B2hEPmuRAL969/VvBjLSYjVedgcPkSafb0qxyZEv2UNffGAyaoHFjuIHtecEsToilAdo9Plc0OMQy1G8JvzRef lbiCCuFY SLJUgTcxEncV7gkTXQrX0LoIZFxsJhDfmBntUMwAyjCmZiTZS/IJIGkG3GxmafPahWma6n3scX8OPsd/qMFfOufPrwAotj05EgCDN0J6wHjDFNER5/jHY+buaPDIr3JL31iiEC4AQRzz5A3G4WmJWPw5EW4FWsV05sWqgfJx7NwWFQWuPMG4pE3+xdrPfC5XrVmleqkUl/DYkmSbQyPKndwtoZXczSvmWNQcY8lN7Q8AfpnGSMd5usQ2FHERiauQnbcCbpzwG7fooFmWLTf2GkBkemUdo4dMp3UBH6ra4dlO+XBYldalG6jpvFZ16ALQXwf8gnyuRAG6uDR4IjGcCWqHIvEzTJpbS6fJOCHnHyLMrIjoZMTgrOQLiv5x7B+60HZsF2FOyrI2WcGft/nJbHCTt73VDIXrU8mVuareGFtOGvauMP4JgIS0qQhid8jqni8uPJtkAacrI3X1aioDMC4fy6rvGu/X9F4Iz+MZIlmxuPTu5LuYvwN3xKQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: I have a few topics that I would like to discuss around ZONE_DEVICE pages and their current and future usage in the kernel. Generally these pages are used to represent various forms of device memory (PCIe BAR space, coherent accelerator memory, persistent memory, unaddressable device memory). All of these require special treatment by the core MM so many features must be implemented specifically for ZONE_DEVICE pages. I would like to get feedback on several ideas I've had for a while: Large page migration for ZONE_DEVICE pages ========================================== Currently large ZONE_DEVICE pages only exist for persistent memory use cases (DAX, FS DAX). This involves a special reference counting scheme which I hope to have fixed[1] by the time of the LSF/MM/BPF. Fixing this allows for other higher order ZONE_DEVICE folios. Specifically I would like to introduce the possiblity of migrating large CPU folios to unaddressable (DEVICE_PRIVATE) or coherent (DEVICE_COHERENT) memory. The current interfaces (migrate_vma) don't allow that as they require all folios to be split. Some of the issues are: 1. What should the interface look like? These are non-lru pages, so likely there is overlap with "non-lru page migration in a memdesc world"[2] 2. How do we allow merging/splitting of pages during migration? This is neccessary because when migrating back from device memory there may not be enough large CPU pages available. 3. Any other issues? [1] - https://lore.kernel.org/linux-mm/cover.11189864684e31260d1408779fac9db80122047b.1736488799.git-series.apopple@nvidia.com/ [2] - https://lore.kernel.org/linux-mm/2612ac8a-d0a9-452b-a53d-75ffc6166224@redhat.com/ File-backed DEVICE_PRIVATE/COHERENT pages ========================================= Currently DEVICE_PRVIATE and DEVICE_COHERENT pages are only supported for private anonymous memory. This prevents devices from having local access to shared or file-backed mappings instead relying on remote DMA access which limits performance. I have been prototyping allowing ZONE_DEVICE pages in the page cache with a callback when the CPU requires access. This approach seems promising and relatively straight-forward but I would like some early feedback on either this or alternate approaches that I should investigate. Combining P2PDMA and DEVICE_PRIVATE pages ========================================= Currently device memory that cannot be directly accessed via the CPU can be represented by DEVICE_PRIVATE pages allowing it to be mapped and treated like a normal virtual page by userpsace. Many devices also support accessing device memory directly from the CPU via a PCIe BAR. This access requires a P2PDMA page, meaning there are potentially two pages tracking the same piece of physical memory. This not only seems wasteful but fraught - for example device drivers need to keep page lifetimes in sync. I would like to discuss ways of solving this. DEVICE_PRIVATE pages, the linear map and the memdesc world ========================================================== DEVICE_PRIVATE pages currently reside in the linear map such that pfn_to_page() and page_to_pfn() work "as expected". However this implies a contiguous range of unused physical addresses need to be both available and allocated for device memory. This isn't always available, particularly on ARM[1] where the vmemmap region may not be large enough to accomodate the amount of device memory. However it occurs to me that (almost?) all code paths that deal with DEVICE_PRIVATE pages are already aware of this - in the case of page_to_pfn() the page can be directly queried with is_device_private_page() and in the case of pfn_to_page() the pfn has (almost?) always been obtained from a special swap entry indicating such. So does page_to_pfn()/pfn_to_page() really need to work for DEIVCE_PRIVATE pages? If not could we allocate the struct pages in a vmalloc array instead? Do we even need ZONE_DEIVCE pages/folios in a memdesc world? [1] - https://lore.kernel.org/linux-arm-kernel/CAMj1kXHxyntweiq76CdW=ov2_CkEQUbdPekGNDtFp7rBCJJE2w@mail.gmail.com/ Other issues/ideas ================== Are there any other clean-ups or features that people are interested in seeing? - Alistair