From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 307CBC02190 for ; Fri, 31 Jan 2025 03:58:34 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 928E82800D7; Thu, 30 Jan 2025 22:58:33 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 8FF922800D0; Thu, 30 Jan 2025 22:58:33 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7516F2800D7; Thu, 30 Jan 2025 22:58:33 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 530602800D0 for ; Thu, 30 Jan 2025 22:58:33 -0500 (EST) Received: from smtpin19.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 32EC5A06E0 for ; Fri, 31 Jan 2025 03:58:32 +0000 (UTC) X-FDA: 83066390064.19.3468FF2 Received: from NAM11-DM6-obe.outbound.protection.outlook.com (mail-dm6nam11on2062.outbound.protection.outlook.com [40.107.223.62]) by imf06.hostedemail.com (Postfix) with ESMTP id 546B6180009 for ; Fri, 31 Jan 2025 03:58:29 +0000 (UTC) Authentication-Results: imf06.hostedemail.com; dkim=pass header.d=Nvidia.com header.s=selector2 header.b=jwop72Rk; spf=pass (imf06.hostedemail.com: domain of ziy@nvidia.com designates 40.107.223.62 as permitted sender) smtp.mailfrom=ziy@nvidia.com; arc=pass ("microsoft.com:s=arcselector10001:i=1"); dmarc=pass (policy=reject) header.from=nvidia.com ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1738295909; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=zcQrTwcX4pruz44G1HodxgPAWCAapNukvBbwl7hidm4=; b=YttNoE8qK7K3oHdzwrga9TCOhUEY8MRPcLLVNq1G8LdQMOdGVofsf+EZ/c1qAW5R/WRf1Z 93mzdcvwb80diFIps82LfF1jGFND5N7eTzRv2LwMf06x7Bu9UNH6UhsA1iw3Plwj/V7PKs Y4nl2qw81tqmmBG6RRvOpmL9bRZolgU= ARC-Seal: i=2; s=arc-20220608; d=hostedemail.com; t=1738295909; a=rsa-sha256; cv=pass; b=kfVQXwCX+ZugaUprFmHeW9Vu2xnVeo9M9UpI36ITCExPtrLeRVYdxRGHdGfLyBP9vlXXqo UVuzMaR8bg7GObE0w69mWvaOd9dwmL3nclplot+AZuiKq5gTx8P9dS1fxRCOxk2gNvnUko s4NTQTVcWDCLe7ejVduimLy+Y4udcxc= ARC-Authentication-Results: i=2; imf06.hostedemail.com; dkim=pass header.d=Nvidia.com header.s=selector2 header.b=jwop72Rk; spf=pass (imf06.hostedemail.com: domain of ziy@nvidia.com designates 40.107.223.62 as permitted sender) smtp.mailfrom=ziy@nvidia.com; arc=pass ("microsoft.com:s=arcselector10001:i=1"); dmarc=pass (policy=reject) header.from=nvidia.com ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=B5vpk4Ab32UM9m/Y9012fv4LRB3NTpaWxkw+TDJNTROR1YMWVyKbserVlCyfnvrCKuK0qqHq/miMLtWzix4qgNMCyZ9ORPBoiQiq36esHT6CDKMpGTr6vt2CCIBffT34z1Y+ifo3RE2rVU9lNWQDmHzYOsZsSJBInB0HpD2Qv0FC5429NgESl03rSqZMDLopF6DPX4t5ypSWcKhFDa8FReg+DS2yKeAmdH/DkYlGLkpfvtWewNuqN3zNBlmUYOXphR4xqselzQ7EKbf85xwal5Ulch2FmeCIo6mOYBENm02d8r4qnWy4fPaG9FJFA5bGBEdjNzbIVea/HwEZl/PIOQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=zcQrTwcX4pruz44G1HodxgPAWCAapNukvBbwl7hidm4=; b=bHxgd3B5cGSZVlSQD/H4Vn7YiiFuG6lp/lccP5yIbPsIjn5s2igx04KKq7ooqZaX6o9K8ZpQMtxbMcMkURrO3d6h1rSvmoruyoG1Nph4WIiV1enxFv8/aXz1t4Qb8lfI32nFLpsjS6AWODqFSQV4xlddOSHsTxMGQwU7TyBmdUDqTJdHujoNQiWCUMPQYVm08NcUgcJs2tTTM9OMlYKEajjqq7iVCMrsJ49Ji0SBEJsUm7vSGm/RaEgP3eQWpp1E/+dzdP+pMWhV7ZxCgykUxUd014cHdIFUQX0r23ZeCg7wpa0cqr7BkYP18h8od4Fqt1E7Vop8G108TOI/fKrzKA== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=nvidia.com; dmarc=pass action=none header.from=nvidia.com; dkim=pass header.d=nvidia.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=zcQrTwcX4pruz44G1HodxgPAWCAapNukvBbwl7hidm4=; b=jwop72Rk0d97sAFXLnzzrwcgfj1nC9qJaWVVfYEKMsOzEIdI8BaYjCQwC4xyWNOJnvxYtioGXMm0rf2O27lQ5q0NFKN347fzkOxxb6m4SYQd9pm+G9hWNTBeQhmF34OFjRQxLl2H3qRRFYL5+XebAPDZLQjkcbZMXNdITop0M0O92daB753nxdf7SKV+Z4b5IQVdniWmrQQ2YnI90lvZgK13f9QXd45YHTcXo67mDFHQv8RmBl2KsjiHyq2bjpjZ7e+0YkoL5BtE9tTMH+ykihVROw1QkiIj56PxQZzIGyMB2unG4HYxwvS0PRhxvurIUN3ptDghW0sXCZvRovjkBg== Received: from DS7PR12MB9473.namprd12.prod.outlook.com (2603:10b6:8:252::5) by SN7PR12MB8145.namprd12.prod.outlook.com (2603:10b6:806:350::17) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.8398.20; Fri, 31 Jan 2025 03:58:25 +0000 Received: from DS7PR12MB9473.namprd12.prod.outlook.com ([fe80::5189:ecec:d84a:133a]) by DS7PR12MB9473.namprd12.prod.outlook.com ([fe80::5189:ecec:d84a:133a%6]) with mapi id 15.20.8398.017; Fri, 31 Jan 2025 03:58:25 +0000 From: Zi Yan To: Alistair Popple Cc: linux-mm@kvack.org, lsf-pc@lists.linux-foundation.org, david@redhat.com, willy@infradead.org, jhubbard@nvidia.com, jgg@nvidia.com, balbirs@nvidia.com, christian.koenig@amd.com Subject: Re: [LSF/MM/BPF TOPIC] The future of ZONE_DEVICE pages Date: Thu, 30 Jan 2025 22:58:22 -0500 X-Mailer: MailMate (2.0r6222) Message-ID: <46C1B764-16ED-4559-9A6A-C70C99033BA5@nvidia.com> In-Reply-To: References: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-ClientProxiedBy: BN9PR03CA0554.namprd03.prod.outlook.com (2603:10b6:408:138::19) To DS7PR12MB9473.namprd12.prod.outlook.com (2603:10b6:8:252::5) MIME-Version: 1.0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: DS7PR12MB9473:EE_|SN7PR12MB8145:EE_ X-MS-Office365-Filtering-Correlation-Id: 29741f7e-7775-47a3-d933-08dd41ab8303 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0;ARA:13230040|1800799024|366016|376014; X-Microsoft-Antispam-Message-Info: =?utf-8?B?clBUeUkyMS9vQ3hudXVNRjJTUHFXOUZOakJjQlJjU0JVWnBhVW5MMVZtTldB?= =?utf-8?B?QzBPd1ZhR1ZvbjQ5ZENWejNrMnphaU9lL2ZDbCtqNmpWdVRpQWRMbHlkNmNv?= =?utf-8?B?Y2grbGRveXF4Zm1EeExKV2JuK2toTEF4Rzc5dzQvcWlVbVY3ZGxmYWZ3a1VV?= =?utf-8?B?Ung5VHkyVTlBLzREMDYxMHJ2czJFYUxUYS9SaFhvQzFFbXRNUkZVeG4vME9T?= =?utf-8?B?cm9mZ2RrSHB1cUtucEZhUVREWUVOSlROSThucmlsTFVUWlEreW1ESUZ6QXp1?= =?utf-8?B?SGU1V2FOcVJiOHdSbE56VUlLWGYvcXlHNCsxeUxDM1B2emZ0L1hFM2JBWDNF?= =?utf-8?B?NVFtNlQ2N013NlFYZ1hRdzcwUS9uc0JYcG1NbFVXb3IzNDd0Rzl2S3lmOWNI?= =?utf-8?B?alR1aXkrcjR6U0I4RWZNS0Z3QlFaYUoxaEw4MWV2ZTBYS1dJVnlObGpWYmVW?= =?utf-8?B?NWlpcjVjVTNxdXpNL3ZwN0hSbDJDQ3JvYkhJb2ZFYXpSL0lxbU5jVjUzSGVG?= =?utf-8?B?K3NzR3BIU0d4ZW0zMWIraVBRb0NaSFY0c1JwY2FHSGxnN0hDekJ5MzJUL1RG?= =?utf-8?B?L0xmUjNGaWdUTVd3R0Z4VjRKRnBXd3crWHppR3lxRWszcE5NWnZ6clJGcmdW?= =?utf-8?B?a3dlNDBCZW8zeVFJQnEzZm5ibFo3dzEvNExPdTdKUzhFOVFIZGdGZWNnMzM5?= =?utf-8?B?Y2svc2ZDbmZSVUJ4SlUvNkM1TEM3UlVKVm8vNFpmZzlDclh5dVVYSFJUS2l3?= =?utf-8?B?TnRBanhFd1ZRMmZhM0k5T2QwSHNLN2E0UkRiVktFVjVmSlNhRm9SYWt0OXNR?= =?utf-8?B?MmlXRE9ab0lrci9EaUNrSW1WSEFnd1lvR2tnaTVRbTMrRk9yUUNPZlA0YVFZ?= =?utf-8?B?VEdENnZPbDRVWVdQL3Q5Y2ZsamJWc1JzbldDVklLUExhSkFNeG1MZ0NGa2sr?= =?utf-8?B?cTBtand6NmtpTnIxMllVUStGSUo2elNwT2xwd0kra3N4VThDMVRBMnlVaC9p?= =?utf-8?B?UnJ2azVnZkwvUGpJTCtYV2FGTzFML0hUZXExb0RRWTkyQm9Ra3VwUFpObkdx?= =?utf-8?B?REFOVXFVdTZLWUZiR0dSQTFkck92TmlZbm0zWFJMYW5UdEx5aEVtSkVTRnQ3?= =?utf-8?B?ZUdSZSs3S2pDSUkzT3VZd2xqZjdtMUd5MnVHT1QwVmY2SUxlSXpQRTRuZVR6?= =?utf-8?B?L2RFc0gzZlAvSG10OWQ3MHh1Y2M0ditzYnVkSWdYQmNEMDdxeG9TYXB1cCtR?= =?utf-8?B?MUFCSHdsS0J1bDI0LzlCNUUwVVQySU5JOHk4cXM2TXBhbWlNb0lCNkl6SXNQ?= =?utf-8?B?enI0bWtpenBWTFBJYmt1eU5CaWNvOE4xNGxldlczNUkxc2UwdVl4eC9vNndy?= =?utf-8?B?NGJESjNVbEpyMmFYNnBWcjhrUGt1c3pxbnFmSFdYblJvMlRUNGdIMnJvOHI2?= =?utf-8?B?M2JuSnFXVWtSV0R2ZmFZSFdzL21HY3NlaVQvSHMwa2s4cnRIZEw4SzFOVFM5?= =?utf-8?B?U3M4YU13U2NLdXhBc3RoSHhiQUgwRHhYeXNrWDIyNHFScHZubmdvR1ZnZllk?= =?utf-8?B?VG5DZTNwL3pxajcrak1hd3o2cFVjakRMejZ5c2FCc012eVZqZ3Z6d0IxWkJT?= =?utf-8?B?bG5VTHJmWjhuNFZoR2VBZVA2U200Mk1TSDJJN1BDVGdBUmJzZnNTRnRCeEJz?= =?utf-8?B?VnZka1k1dmxOemZ5bllpZTd2MVBTZjhHWEhMUkNuYTRJOUQzc0RTUTNoQXpn?= =?utf-8?B?RVNXSVZMenNGL1pzeGo5MEdwSTN2NlRpNWZLQ1p6R2JSeHVGTTdETFBOWVJG?= =?utf-8?B?a3ZUaXZxVVRJdkdKelN0TG95b0VwZnFrMDlrWTM0bmdIVHMyVm85MkFKRlQy?= =?utf-8?Q?+hsCaVT/ntGrB?= X-Forefront-Antispam-Report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:DS7PR12MB9473.namprd12.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230040)(1800799024)(366016)(376014);DIR:OUT;SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: =?utf-8?B?ZjQxWjRKeW02RE5RLytRS1ZGcExIc093RzAvVW1DT0lIT2ptOG0rOXArN0ZN?= =?utf-8?B?UThSNjB1U2doN2xZWlN0c1l4VDZmUVV4V1R6b08rT0d2VStxblBnc05hdTF1?= =?utf-8?B?eER2K29JRzY2UElsa01YWUZTNmdQL0ZibkNJK1V0TVBGSVA1OHVrMXo3bGZl?= =?utf-8?B?Y3R1U25hTDcrYTFhZExhTWY3QVgzRnNLZ0dlZzlBbU1MUFY5bnFKUkJRdVZj?= =?utf-8?B?T24zWk1ZRjhOSlhlK3plWXduL3UvZ0JBc3lsWm95Q2xHZklZcVdld1JlbWsz?= =?utf-8?B?YjR6dXMzOUJFeEh4K2xJWFJ2S0dLaTN6RDJHWUpGRlR4dGowSGpyZG5sTTRC?= =?utf-8?B?dUJtRGM1YmZGcGRBTklUc2FhMHc0SGFGcnJjV0pKeWhqeU1XbFpaM0ZBMlZQ?= =?utf-8?B?VVRCdHF6SXNoeU43V0Q5TnBBa2tPUDFnNitsQ2xHVXlVa0x0YmlKOThZTlpQ?= =?utf-8?B?NWxTZE8vZW53V1M4YnM1YmliRHNEaUhaQzEvbENaSWxaY05ZUWFSQUV1dGVJ?= =?utf-8?B?R29JREdlNmprL0VHZ0hNT28xTWlWTUo0S3FsRVV4UFhiTmNyYmhDWmhlQzBt?= =?utf-8?B?YVFCV2Q3NkNQejQvd0VFM0lGMEdUN25xU1NXalBwb09sczlXUjZLaTR5blFZ?= =?utf-8?B?K1lqaVFUSTBtb1pCajJqQVZIcjNUYzYydkJlejN6bDUzRm1wTXk3RzRLeStS?= =?utf-8?B?QnJ5NTY4aVRPRkNhUVVyQzNNcFY3SGpqVnNMQmJnUWdyU3lCZFhXNTBxVThp?= =?utf-8?B?MFhuSWRnZDVUREIyMUc4azJZYXcyNGlaT3htbWRPTGw4TmNQZktDQmVtOGlU?= =?utf-8?B?VFZ0N0hORUVPMWJDMHJjam1QNnVweTdYUXhQaTBhUzB0TGR5OXNheUZ4WGo2?= =?utf-8?B?WGxrSFUxUFUyUkNtTWcwREJHeGJYMU9yMDN5aFhxa2ZvTkloRyt4M0xjc1Er?= =?utf-8?B?SUFjUklNc1g4WGs1TVBjVXoyRDkxd0R3MVhwaGZHTVlLay9ZTHVHV2g4WXJZ?= =?utf-8?B?VDRhYVhoUW9NSjREdm9NdlUrdnpDSE5ib1loSXNabk1ZVWk4K0ZRWVNka1Rl?= =?utf-8?B?T1AyQkZpVVlJcVRMSFhGVGRmUmtTTnJjL1BxVUl5cFJnck1SSE0yMFByNzdh?= =?utf-8?B?bmNDdmN4QTBmSUp3NkNTeVIxMjYvVnBtRkYzUDN2OEhUVFliNmRPbHU2VEpD?= =?utf-8?B?bjlxSG1ZdzNRUnlYWVBxdzBmeU5VbysyVjY3QVZ3engvcEcxekIvS3lRdDhp?= =?utf-8?B?cWhYTjJpUWhFb3JZNkY0Nnc1Sy8zR2RvS0c0ODJUcjdiVU56WUt1N0xyTTNO?= =?utf-8?B?WVF4d3JJQlFveGUrRHJodkVpR1lONm1vWGR4a2ErejQwYWpSanpwZzhVRmZq?= =?utf-8?B?cGJhdHUxTVJyZUhWcmZnR2pRL29PbEdtNUQrQVBMTVdQWWlwN1lNbE5meXJa?= =?utf-8?B?bXNRM285SkdDei85WmhQeEUxYVJqcDRWdldtQjEveU9HR1djK1ljaERNcmU0?= =?utf-8?B?bjFPeCtGaGhhTi9wbHNSNHdOZ2Z1Y2hUWFNsVVlOVTRNWkxUQnoveWpsUGtK?= =?utf-8?B?SkgwaXorMzJaazA0aExiTW9lVE44bUtCT2wxNEJoVDZoc2w0UXEvMVZvcERI?= =?utf-8?B?TXpSSE9scUxCOXpCOE5iNnh2cXV1NXllZk5CTDd5eWZwUWFFNXRiZnpBcWp1?= =?utf-8?B?a1pFVlUwT0pGQjAwdlNiY1lCbzhYQjRQbm53MmFDMDhXTW04MWF1UmZBbmpV?= =?utf-8?B?THZaNXVJTXpYbjNLa1hzMGxRQ2RHUy9CY0E3cEdYdjFHV3dCN2xRNkV5a2Vz?= =?utf-8?B?YVBEZEVvdHhCVnNBd2VaRUxnS0Rqd3V4aVJCakhxVkZvcXo0WC9NR2MxNmMw?= =?utf-8?B?alJ6RDJHM252OS9uQjNFa29pSkIwMDFiem8xcytUT054TDFUMGE1ZEp5K0hO?= =?utf-8?B?ejlKMDlPT1BkSnJkN0hhY04zanlHV2hSaTRzSTRjeXoyN3J6eFRCTUM1cDZ6?= =?utf-8?B?SjNhNWlHKyt0cVQ5dFdVUlVFTStsUXFkQjNDS2JiUzdHdWhXNzAvT2ZORUFV?= =?utf-8?B?aHMzbmhXbnByTUswUFBrOVhLOS82eHVMejJ5ZkdGVk1JSUtIOG9FaVprYzRq?= =?utf-8?Q?Mn8A=3D?= X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-Network-Message-Id: 29741f7e-7775-47a3-d933-08dd41ab8303 X-MS-Exchange-CrossTenant-AuthSource: DS7PR12MB9473.namprd12.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 31 Jan 2025 03:58:25.0893 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: 7d1My7dsA72g+qbMn6D77S6ifBl1VEjLVMQwPH0uSfJFEZo+d+gC4MZ6nSsTk2Pq X-MS-Exchange-Transport-CrossTenantHeadersStamped: SN7PR12MB8145 X-Rspamd-Queue-Id: 546B6180009 X-Stat-Signature: d6moesqqkdwbbnqopk6e6uuy1hk8tqzk X-Rspam-User: X-Rspamd-Server: rspam12 X-HE-Tag: 1738295909-398744 X-HE-Meta: U2FsdGVkX190i0fSfXWZ0nQwdvD6oNElrPUJ0W/E7HSwZDUmGsJO1+h5pGIMNVqmrBI4skix5XKZn/VpGU0CIgrELSmc9edhBYtMD4kf1sdphu+aIsFWXBjLEU0Onlal+//Diyzi8V/83IAfHpl6eRNm7fgUX9b1K5fTLwsIc9mWfu32FFEgJZvuRp3fc3hg6xV8Bku8mR6miNotbfz3dtuFbohj5qQFc5Ls6fouPT+5/tXT6d/s3yoF03+h1WIWPRzZOmgMztFcLcNkcqAld81I+ErzyoqzoKpA87J+yT0LlYqfSqDQ12/a92XRpFkwCYVRVW3+NLjSOBXLnoTbM4EvUCinXtMhytI0wkBFvs4pPAxkblBGvHPTlVTSNFUui8vl319jWZrNjKBnXveqV4vrTw+i/o0ggDjmorFYAty+x70C70+18WJV1SO3SQs62/TKGjnY4DSM+W28bcuRxVAIVJzewtxqI+bpl8mtJiv0HPkPOssAn82pKKpLezpTSheEPe6/G1C7avLqVpAZuQia/NOgD3ZtGYVwtKESWptTMWk8QbQDtJ16Rj+VzdiUroWFLtJTDxYoBJ4gFIvUp8e4DBfd6JF1IZgnTZizjk8T6ii1NLbo/0JEG/iN1kCa7QO5WuWa+cJ6QonaL7XqSUC7zZVL/254o7DhKLz0tRo2cx9b9au0lfPcu8noIRRT6ZUyao1dtkfEvkkzHJOcGZwtEKG9OzBlQdK5ZBOBFX8sOumDOAQ2D12Mb7Loco25Ijy8AUGnxg5svIsRkq4eYu+TuQ6wiu20fsWgFgPkKSorM6c9OhGlFH+ukMw+IUurG2xyMTy/L9PYjc9prZtBurC7SsQe+J5oBdRAyGH7cutde1Gx2DXXAvaztsbtL6CoqBw65f6NZSfszmXILoJP1eA/jFsQXG9OhBEu4mAodJjFr1yx5FlaECfXdDZZQivJGw/mo+kg7YD+UEVsL1q 9S6H/kQV 3ZsNCrrhYhDKpkzNphaWZE8NDJsTAcbrwu8pwqLBhF+PkPmjXGgurQ2xz1zI4rEp7c/IFmOPjdCSml6/TB6RIiIpeF9iFXzstAmQvVEgmmP0DaXbBjFRPFbHFSE+o/ld3kBW1G37uXDNk+GAKBZImzzZs2OJhxESWWAv677eCVUFYMh9m2lCXknTaz1UaLQuZAi+yp7nlufn7Z0kFfN2GSjApWly5w7Nqi6nHwPIpwTyRJtjLaSMUvSEjUNJz1B93fGzC9pYdF9T5boHdqQBnLvcswM45bdW6Z9INzaU31UZ6TTM+YK0Txi1DH7LCbcMpMQpDXLYoMKMCashB2JCIa1pRpknnXLU2HKcFkmitLMYoWxpCdp8PTPj05wxtLbI9J6w8Gv6qiTrg9KXikd0hZMAAPTInfWFTw/GBXIdSlqJkX7EfHcColnYsKwWhTtRr0IoXf3TZMZVkSZWuHa4NlzYQQA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 30 Jan 2025, at 21:59, Alistair Popple wrote: > I have a few topics that I would like to discuss around ZONE_DEVICE pages > and their current and future usage in the kernel. Generally these pages a= re > used to represent various forms of device memory (PCIe BAR space, coheren= t > accelerator memory, persistent memory, unaddressable device memory). All > of these require special treatment by the core MM so many features must b= e > implemented specifically for ZONE_DEVICE pages. > > I would like to get feedback on several ideas I've had for a while: > > Large page migration for ZONE_DEVICE pages > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > > Currently large ZONE_DEVICE pages only exist for persistent memory use ca= ses > (DAX, FS DAX). This involves a special reference counting scheme which I = hope to > have fixed[1] by the time of the LSF/MM/BPF. Fixing this allows for other= higher > order ZONE_DEVICE folios. > > Specifically I would like to introduce the possiblity of migrating large = CPU > folios to unaddressable (DEVICE_PRIVATE) or coherent (DEVICE_COHERENT) me= mory. > The current interfaces (migrate_vma) don't allow that as they require all= folios > to be split. > > Some of the issues are: > > 1. What should the interface look like? > > These are non-lru pages, so likely there is overlap with "non-lru page mi= gration > in a memdesc world"[2] It seems to me that unaddressable (DEVICE_PRIVATE) and coherent (DEVICE_COH= ERENT) should be treated differently, since CPU cannot access the former but can a= ccess the latter. Am I getting it right? > > 2. How do we allow merging/splitting of pages during migration? > > This is neccessary because when migrating back from device memory there m= ay not > be enough large CPU pages available. It is similar to THP swap out and swap in, we just swap out a whole THP but swap in individual base pages. But there is a discussion on large folio= swapin[1] might change it. [1] https://lore.kernel.org/linux-mm/58716200-fd10-4487-aed3-607a10e9fdd0@g= mail.com/ > > 3. Any other issues? Once a large folio is migrated to device, when CPU wants to access the data= , even if there is enough memory in CPU memory, we might not want to migrate back = the entire large folio, since maybe only a base page is shared between CPU and = the device. Bouncing a large folio for data shared within a base page would be wasteful= . I think about doing something like PCIe atomic from a device. Does it make = sense? > > [1] - https://lore.kernel.org/linux-mm/cover.11189864684e31260d1408779fac= 9db80122047b.1736488799.git-series.apopple@nvidia.com/ > [2] - https://lore.kernel.org/linux-mm/2612ac8a-d0a9-452b-a53d-75ffc61662= 24@redhat.com/ > > File-backed DEVICE_PRIVATE/COHERENT pages > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > > Currently DEVICE_PRVIATE and DEVICE_COHERENT pages are only supported for > private anonymous memory. This prevents devices from having local access = to > shared or file-backed mappings instead relying on remote DMA access which= limits > performance. > > I have been prototyping allowing ZONE_DEVICE pages in the page cache with > a callback when the CPU requires access. This approach seems promising an= d > relatively straight-forward but I would like some early feedback on eithe= r this > or alternate approaches that I should investigate. > > Combining P2PDMA and DEVICE_PRIVATE pages > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > > Currently device memory that cannot be directly accessed via the CPU can = be > represented by DEVICE_PRIVATE pages allowing it to be mapped and treated = like > a normal virtual page by userpsace. Many devices also support accessing d= evice > memory directly from the CPU via a PCIe BAR. > > This access requires a P2PDMA page, meaning there are potentially two pag= es > tracking the same piece of physical memory. This not only seems wasteful = but > fraught - for example device drivers need to keep page lifetimes in sync.= I > would like to discuss ways of solving this. > > DEVICE_PRIVATE pages, the linear map and the memdesc world > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D > > DEVICE_PRIVATE pages currently reside in the linear map such that pfn_to_= page() > and page_to_pfn() work "as expected". However this implies a contiguous r= ange > of unused physical addresses need to be both available and allocated for = device > memory. This isn't always available, particularly on ARM[1] where the vme= mmap > region may not be large enough to accomodate the amount of device memory. > > However it occurs to me that (almost?) all code paths that deal with > DEVICE_PRIVATE pages are already aware of this - in the case of page_to_p= fn() > the page can be directly queried with is_device_private_page() and in the= case > of pfn_to_page() the pfn has (almost?) always been obtained from a specia= l swap > entry indicating such. > > So does page_to_pfn()/pfn_to_page() really need to work for DEIVCE_PRIVAT= E > pages? If not could we allocate the struct pages in a vmalloc array inste= ad? Do > we even need ZONE_DEIVCE pages/folios in a memdesc world? It occurs to me as well when I am reading your migration proposal above. struct page is not used for DEVICE_PRIVATE, maybe it is OK to get rid of it= . How about DEVICE_COHERENT? Is its struct page used currently? I see AMD kfd driver is using DEVICE_COHERENT (Christian K=C3=B6nig cc'd). -- Best Regards, Yan, Zi