From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id A6F40E81BDC for ; Mon, 9 Feb 2026 15:47:08 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 1E2D06B008C; Mon, 9 Feb 2026 10:47:08 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 1AAE36B0092; Mon, 9 Feb 2026 10:47:08 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 00D8D6B0093; Mon, 9 Feb 2026 10:47:07 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id E49C86B008C for ; Mon, 9 Feb 2026 10:47:07 -0500 (EST) Received: from smtpin04.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id AFA53B6258 for ; Mon, 9 Feb 2026 15:47:07 +0000 (UTC) X-FDA: 84425346894.04.2734B59 Received: from BL0PR03CU003.outbound.protection.outlook.com (mail-eastusazon11012045.outbound.protection.outlook.com [52.101.53.45]) by imf02.hostedemail.com (Postfix) with ESMTP id 7F71980003 for ; Mon, 9 Feb 2026 15:47:04 +0000 (UTC) Authentication-Results: imf02.hostedemail.com; dkim=pass header.d=amd.com header.s=selector1 header.b=yAoiW20Z; spf=pass (imf02.hostedemail.com: domain of Honglei1.Huang@amd.com designates 52.101.53.45 as permitted sender) smtp.mailfrom=Honglei1.Huang@amd.com; arc=pass ("microsoft.com:s=arcselector10001:i=1"); dmarc=pass (policy=quarantine) header.from=amd.com ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1770652024; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=wlku9x2UlLTD5XLthUkVdtcIPmFGAE7y4IdhaKs0d30=; b=WycUNsSCm4/Xdi/fxMNW0V8pMBJyDIHRXnzlI0YaamlqHfVhDKx5s0Xvj86p78FzJKecgT Nx/ue/NSY5ISgKI2RiWsTLJ76xj0KbIJ/Tx9/GixtOw4Y5oIElsfvwN4Yc8ptDQjDvnng/ b2YXUZEzilWN4fswBGtI5nYLOSyxJaM= ARC-Authentication-Results: i=2; imf02.hostedemail.com; dkim=pass header.d=amd.com header.s=selector1 header.b=yAoiW20Z; spf=pass (imf02.hostedemail.com: domain of Honglei1.Huang@amd.com designates 52.101.53.45 as permitted sender) smtp.mailfrom=Honglei1.Huang@amd.com; arc=pass ("microsoft.com:s=arcselector10001:i=1"); dmarc=pass (policy=quarantine) header.from=amd.com ARC-Seal: i=2; s=arc-20220608; d=hostedemail.com; t=1770652024; a=rsa-sha256; cv=pass; b=XWfwpeYcf4zxfFQFEhnTqRdjANfsCMNmQtLMZaqWfZNfbwUUvc74AlhIhVryOXDQ0JaSOi U4q0uhjcuLZemUIjTaLlQ9jgPrUYj6wULIQhZctqK5GWyQLR+c689Xn18vkpYQv7FFXHz8 T12tkcfbQZRkri/QBP990FG8mFDYFl8= ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=G6WQRSCwk8NhHNPb9+NbgkY6WcVbH+dTmNGUt/af6h3ZUkwvyR7v+cWqvFiIkeyJN3PGLHpIa14A8z/fXWiNZLIYU8uP9cFR6qtX4aFFrhG1fr4gbuUYbOF9c45g/L/Wlz9Jc5kV6DxN+fWu0kpa9X/8Eh74IRF7U5qD5/7SYngLOBiXI5PfuUVGCtmFX6t1QBi6Myv9vulUNUNRFib4G6q/7EfD5M2w2UFkVOCPUsG63uJLiHMQmQJo4vd9TuedtQut4gbkDEfWYO+p260O8VOyAkGhgM0wpQhDE6XRguA558B5lORhKObBkP4c5lxzjuKzmj0Ft/zOLujJH6MGVw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=wlku9x2UlLTD5XLthUkVdtcIPmFGAE7y4IdhaKs0d30=; b=FF7QSEfgXzcEA7s1O341027mpCQBL/rhem6hZzRsgwk9Ky7SdfBANpJD8eK/EzJsqRyB+bWzYIMYPcW6GFyoXmKtxwdq7ERjDeQxg/+oiSxjmJ6Bovc/q13rM3Gj5GcqpKyEIFCmAlkzc9Im8GsnoI658g9rnYmArv+ajFGKBNC0Megm8Nh64w0Wb6OvaH5phCkz4C1V1c+D3EwFsB3OIpbqCeGhE2WhaUpMRIhyaNExs/DoabGH2LieiP7hE2Ne7xUVhoUPfvj8I3tgrozW3wGIV9+oHNwyqVeRdAosULc8IK1TPHQktvQnhm/HS7EKXL+mtqV9uRGk2xdVONI2iQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=amd.com; dmarc=pass action=none header.from=amd.com; dkim=pass header.d=amd.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amd.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=wlku9x2UlLTD5XLthUkVdtcIPmFGAE7y4IdhaKs0d30=; b=yAoiW20ZUXn9GiVbd1jR+IbIHz7P5btk1Nv3W+qPcac6j6VSfoyAJx4bguAKbI7878SG9B0LFp9kN0zyteaOiECfAk6sa8pNS6lkLhFSvmRWHadhmgnE9Tqjxzzn0PaTL9HHqRkZ4oybgkqU31peVJD1exX6KIc9St334wui8CQ= Received: from IA1PR12MB6435.namprd12.prod.outlook.com (2603:10b6:208:3ad::10) by DM6PR12MB4417.namprd12.prod.outlook.com (2603:10b6:5:2a4::12) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.9587.14; Mon, 9 Feb 2026 15:46:58 +0000 Received: from IA1PR12MB6435.namprd12.prod.outlook.com ([fe80::8b77:7cdb:b17a:a8e2]) by IA1PR12MB6435.namprd12.prod.outlook.com ([fe80::8b77:7cdb:b17a:a8e2%5]) with mapi id 15.20.9587.017; Mon, 9 Feb 2026 15:46:56 +0000 Message-ID: Date: Mon, 9 Feb 2026 23:46:46 +0800 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v3 0/8] drm/amdkfd: Add batch userptr allocation support To: =?UTF-8?Q?Christian_K=C3=B6nig?= Cc: Felix.Kuehling@amd.com, Philip.Yang@amd.com, Ray.Huang@amd.com, alexander.deucher@amd.com, dmitry.osipenko@collabora.com, Xinhui.Pan@amd.com, airlied@gmail.com, daniel@ffwll.ch, amd-gfx@lists.freedesktop.org, dri-devel@lists.freedesktop.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, akpm@linux-foundation.org References: <20260206062557.3718801-1-honglei1.huang@amd.com> <8ba8e4f2-89f2-4968-a291-e36e6fc8ab9b@amd.com> <38264429-a256-4c2f-bcfd-8a021d9603b2@amd.com> <451400e6-bbe0-4186-bae6-1bf64181c378@amd.com> <0eaf1785-0f84-45e5-b960-c995c1b1cf1e@amd.com> <648e06d1-b854-466f-bf13-0c36ee2c36a1@amd.com> <9c7ab1b2-1a78-43d7-b4a7-5bc561158380@amd.com> <410040f0-d7eb-4a35-9e4b-54a3517a5cfe@amd.com> Content-Language: en-US From: Honglei Huang In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-ClientProxiedBy: SG2PR06CA0223.apcprd06.prod.outlook.com (2603:1096:4:68::31) To IA1PR12MB6435.namprd12.prod.outlook.com (2603:10b6:208:3ad::10) MIME-Version: 1.0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: IA1PR12MB6435:EE_|DM6PR12MB4417:EE_ X-MS-Office365-Filtering-Correlation-Id: cd8f6da9-e3e3-4b3c-1e41-08de67f27362 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0;ARA:13230040|376014|1800799024|366016|14052099004; X-Microsoft-Antispam-Message-Info: =?utf-8?B?SlVvUjM0TEpVek9UcnR0aTh1UEJ4d3Njd3o3ZlU2MSt4ZGJCSUxFNmxINWRJ?= =?utf-8?B?VExYaVN4bnNBRVYyNnhpQ25EVWc4OFZ5YVJGTmxxOVlpN1RYc3lNR212UC9q?= =?utf-8?B?eDhxSVpKQVhZYlJ6UFg5TER1d3krZUlEVmFBbElTQlFPZ2FMaWJNM0VvQmR4?= =?utf-8?B?eXJFOS8vVjAzNld0NXNVVDNycmdjM25qM1BEbHlHeDd6WjZVY1pvelFVTWgz?= =?utf-8?B?L0N4ZXY4M0JxRE9CYWtGS3VUYWFOMjNxLzZFQlRDcEphZlpkZE16M1FLMjUr?= =?utf-8?B?ODlSSEhYN3ZqM0tGSXBhSDdiREpyT01HbTVUazJXaEhpM2tDYVlZSzdaSjY4?= =?utf-8?B?VDljc0tFV2NpK3hYTlhETkZEVXVFY0c0V2o1RWd1SFRGMVg4SHR2V2hKeU9P?= =?utf-8?B?bkFRT1FPWHh5MWNJK1NuMVpReGo4TGdJanZMZ3N1OGE5aTdhanBJSElGTHVD?= =?utf-8?B?WVZCS0RqcVdxUDNkNlpLQTJ0d0M1bm1mMTFEYlNRaEZhTEVobjA4alhBdWt1?= =?utf-8?B?amo1L2xmY2d2MEtjdFcwOTE3NUwrb0ZrcUlDRE1uZktycHVkaTR0eTZjZ2pP?= =?utf-8?B?d05Sclk0NHo0cDN0eWt1SDNNb2J3QklvNUp1VmVPZVFFU2xZem52eUhZWHdw?= =?utf-8?B?Z2k2L0ZqSHpISFlHdTh3R3Y3NEFscDVXRmgxSjFOcXpLLzlDRUlqeGhXdFJk?= =?utf-8?B?dDR0MXE3cmZOdXZkcHovcUUyQXo5aUFUUXB2b0RTU0RoMGwzK3JXenJnQmpq?= =?utf-8?B?VEZEK3A2blVrZ3lhWTdSaFh4bU1qaXlNa2JXK1Bid29FQ3lDbThCU0lDRUxD?= =?utf-8?B?eE52VWNCbTl4bTl4cWY1dm9NY2lVTC9HS01hRzZMLzY1Q0NPOFhpeEFyamc1?= =?utf-8?B?cTJsT1BPZ3VteGdlZ3RsRi9vL1gxWCtRVkdxaHZIbUVKYWJwZlhOamRHWitS?= =?utf-8?B?cGw5MUpvdjk1QmRJS2VrTlhDTlNoT2FZWXRNVEM3K0NpSmxRNEI3Z3pHL3Vh?= =?utf-8?B?VVQ2Ym9ubVVWd0NpZVZDUWdJcmExWUZmWWZHUjg2OFdwcnNoMDBxSnFXRXNz?= =?utf-8?B?Nlg5YWNkbkVoRlhvZWZVUmVzM3MwMHk3ZXpvcEJxYyt5ZStObGorQ0RJeVdQ?= =?utf-8?B?by9KK091cmo2anRIMzNoMktzb3JLSHVGWFN6SCtvRjJJalhMSFFRVjd5aFhH?= =?utf-8?B?WGtObG9rVWtseE1yd2Jpd0E4NjZ5Y21XMkI3SG55b0U3Yk5wUjNIZWlMWHBT?= =?utf-8?B?MkkxZXdoc2VmVVM2NjNnQ2dlKzJUQTV3dTk2WFh6Q0QzT05vcFBWYXpnTXA2?= =?utf-8?B?SDhBamE4RDMva01EWDh2UHBlaFVrSU9WWnJBclFJNHJpNDBRT0RQWmdVN3NP?= =?utf-8?B?M1lUUXdjWnIxNERYTlZMQml5TVdtalVqSGUwWW01eU0zWUxkMUdhRVg5T1Jy?= =?utf-8?B?Y0lod0lVaU9UWTVTSS9wUVNvSW5vYTlMeEUycDlVckRQY1g1NU9iV1pWUVZx?= =?utf-8?B?Wkd3R29na3pMeFRIamx6RlhScmxVZlllQmNKTFVRcG9QUURTd0hYemNLZjZh?= =?utf-8?B?bDlCcG16QzlxSkViaC9iNWl0MGFhcVlIdW5MdjkvM0xSeW0xK1RnNElEZ3dt?= =?utf-8?B?SkgyK2VVZWpwUTZmL3Z5UUtVVEQ0SjFqZGV2bmNMNVlxbFpEaDgwRklGL3BH?= =?utf-8?B?ZVVxUEt3enAzbXJyZysxbU92dFJ5Qm5jNlRlcHVhcERVbkVjb3Ztc2hoZzVU?= =?utf-8?B?ell4eCtKYktVVVFWT1I0Q2ROVU1rZXNjcENJc0xmMVZlTHIrS2RLa3lFM1Vp?= =?utf-8?B?MlNCenpNT3J3TklVU0dEQ2E1OHYzRUQrQjdqeWIxdmp0N3VIRU1jYmVTMlRH?= =?utf-8?B?bTlpMWlHakRxRWhFRkp1Rk04ZTZuaGdsbUEycHQzQW5ON1FWM2JiWHltaUtG?= =?utf-8?B?Z0h2bm5vQjZkYkNWc0szUE5xQWpGK1BzYlpRUEM4QVlUSy9GNllOcjVML3h5?= =?utf-8?B?Q0FVQWNLaDhoOTBadEdDRkxMOVRrcFV6OTlEZitwZlc5ZkI1SEdyS3gvMFo0?= =?utf-8?B?MjZnNGp2R1k0dXRDRHloYlN1SlJkUkdEVTNQaTUvREdIQklrYUlVZS9lb0Fx?= =?utf-8?Q?UeDs=3D?= X-Forefront-Antispam-Report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:IA1PR12MB6435.namprd12.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230040)(376014)(1800799024)(366016)(14052099004);DIR:OUT;SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: =?utf-8?B?d0xIMmorVmkzTW1CWHl2ZXBmakROaUIva1Frcis2SjR6a0Ruc055Y0FkZENp?= =?utf-8?B?K0hiektzY3lMVlFPZHl3RGMrWUVkUlVoTXI2eWFXZTdKSmZMVEhxSDZOOEdX?= =?utf-8?B?cm9NS0krOEJmUzYzZUJIdVA5SHFWS0IxL1NOQk5kaXY5TWpaMTl1bDRMeWZP?= =?utf-8?B?NkU5eXdsZGgrTWp5MlZham1jaXhCWlVGUFRUTWdwSTlCbzFWSitIcG5lVVpp?= =?utf-8?B?NkxXblRuZjkrWVB4ZDZHUXc5aGRHMjFZQVJkR1AyN2U0b3ZQeW9OY2ZFb0g0?= =?utf-8?B?MnhrR0lXdDVXc2IzYm5RQzdXQnFGMlczT1hHaFNDblFUbWl3YW1WQTBMcHE0?= =?utf-8?B?YzhpcG04Y2hIKzFyZHpkdnp0RkRNZnFiYUUyV1F0clg3M3NBQ1drQ1dOTkhz?= =?utf-8?B?V2dLdXIwMUcwNWRBMmkxdVhrWFp1anFIQlh0SjdPTHV5bGlNRFBKNlh2clVn?= =?utf-8?B?V1pBc2pSMEJ5bG9xSHI3QjQrTUtTVUE3TU1Bb2ZUWk55RXczNVpFNFFNQzJk?= =?utf-8?B?OWZGRnZaRDlHU3ZNbnoxQnNuRjlkL0JGbkVZeTRiUVN3ejdtZncvT1Z5L2hW?= =?utf-8?B?aFZ0aXZXNnNkVG4zT1BBck1qSEJOeVpGUVNjYlh0Q0hVMWY1K3c4MC9DaEF1?= =?utf-8?B?aElCMmVQMnl6R1h2d2hNTEpVdnZZNGlwdE0yMkU4T1Q4OFpxS2tYZ2hibWo5?= =?utf-8?B?WlFSRVBVUzZscGFSLzdUU01nRkZsR1ArV2dWT0ltRWhtQ0hlYmU0R20xYmY0?= =?utf-8?B?aFc4RWYvNmx5SDBkQkFsOHhpekc3VFdMWi9NeEI2dlo5RGw5a2dMNE1iM2Zi?= =?utf-8?B?MXpXY0pjaHBNK2NublZRUGFwODFJays2UFRCSDBRek05OFVoY1YwTjJnNUJi?= =?utf-8?B?MHJhazlpcjEyQ0wvWDBpOHR5NG80d09UK3c3SnFCRGQzSWU4azdBYWpUency?= =?utf-8?B?SXlKSmhTb1dsSWwwKy9wbzZuMlBsd2s2ZVpYNDZ6d2ZPNnpESnhESVExMStC?= =?utf-8?B?SlNtcnA5R1RRcjgybUkzRHdvTEdVNDQ0RTJ2ZjlxcnNwNG1Ya2loRE1JWDVT?= =?utf-8?B?MWFmZTYrTDlheDVUUkxxNHlnNTB6aXV3TG44eUVDQVlUWk1UL2JwcURnTHc5?= =?utf-8?B?Q0g4WXkxU1IxdFUvT3RRK3hiVDBhSXV3UTFBWWNENnBrVUZ0NzRGNS9kRFJV?= =?utf-8?B?NWh2YUxrdU44UHBTVGlDajFDUDBTOXhlMFJXbmQ5L0tHb1lGcUY0ekVHVFBq?= =?utf-8?B?Vnc0cjltaUdsVVdEQ1gycnh4emRFd2tPeXdzRGFaTGlCL3NOUlV5TUMxM2Rs?= =?utf-8?B?blJiVzFtdUJ2NDlKWHU2Y2lGNTk2cUdjNWJQMnNNbnFBYWhtWlltUXpqVFdq?= =?utf-8?B?aGR3WXNlNkFiOGJyMUliSXRXaEFudEJlcUdRY2pGSVBTUS8zWkJ2OHB0VWJt?= =?utf-8?B?Rnd6UTZDZGhsUWJXa2daVG5aZzM0QlJkVTlLV1NoZXhUOHhxZlp0MjVEUlNp?= =?utf-8?B?WXpob3FtQjNqT09KR1h4L0lWbzZUQ283WHVVOHk4NXorekYzV1dTcDNIMEZr?= =?utf-8?B?Z05wUWhOdXE3TERuTkt5ZDI3OXRjeDc0Z1BHaVhMWFdYZFU4OWRHMGxlTjRr?= =?utf-8?B?R1lnd3JjK2Vod1o3Y2RqTDRkL2p6SmJkUitUc3hVL1lCQmRvbnloQ0ZOY1ZO?= =?utf-8?B?N1Q0ajArQXhjUHFSUWVDcDZOaGI5Ui82dFBaTitzd3NUZWV5dVRkQjBwSWlh?= =?utf-8?B?NFBGbmgzUjFZV21QcjZVSDRaaHB3TzF6UENHays1bVB5RjEwb3hrcXpJWjZs?= =?utf-8?B?TmFKOVBNWnQ1bkdOTEZEVUJ0M3dLcVdEeFlmU05CTC9DaGxQYjRESDJIYXpB?= =?utf-8?B?RjFLRDErN2dyajE3bkR4MmJweFZpeXJzYUJzRXIwdDIzNC8vSjJJaFQrbWJH?= =?utf-8?B?M0RLaTRWandQbVlHZ0hFKzI1UFBMSWllNjFxY0JseFhBTXErNGNqb1V2bldt?= =?utf-8?B?ckoxcHlJWmZjWmRpV00wVGhTV0Q4SjMxZU16VE50R3NNQUxzOHBqVkNoc0Vk?= =?utf-8?B?TlJhK1BiakdDVHZrQmJEdmt6NHpXc055ZnFkSGVDeTZibWpGQ3VvcmFJQjdI?= =?utf-8?B?RFp0M01HNTBZWjV2NHpudU44V1pldERQMlN3c1lrRFBrazZ5bEVuUkpJeEN5?= =?utf-8?B?ejZVRmZpcExjeFg5UjVwNlJEZ0U1WXp6WEM1NnhwSG52bDNYN1FqWXc3TDFD?= =?utf-8?B?eVIxVUx4OVVmT0thR2VaUGg1NGZFaVpJcytVQmpvU3VUSlpRbFpIZVgvQUJB?= =?utf-8?B?QzFNbFdPYXprR3JsSG1YcUpuR1lha1hZRnh2cnNhZmJaaFNJM1c3QT09?= X-OriginatorOrg: amd.com X-MS-Exchange-CrossTenant-Network-Message-Id: cd8f6da9-e3e3-4b3c-1e41-08de67f27362 X-MS-Exchange-CrossTenant-AuthSource: IA1PR12MB6435.namprd12.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 09 Feb 2026 15:46:56.1134 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 3dd8961f-e488-4e60-8e11-a82d994e183d X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: Z3XQ3UKjAas6AqF0W88OIxO4nqs3caTSdv2KKlQ0CL76jCALnA0lkTsdOBWb6NC92SKbN/++N+1xxYBXL3NrKw== X-MS-Exchange-Transport-CrossTenantHeadersStamped: DM6PR12MB4417 X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: 7F71980003 X-Stat-Signature: hpdez6b1z4m8ztn3ixtxum7sotxtizc5 X-Rspam-User: X-HE-Tag: 1770652024-611693 X-HE-Meta: U2FsdGVkX19nDy6AdCmic1CzkIuudbmEEUdhsb1RBdPRInppwVd2VBkO55Cf6W4sMoRERXyC+3OVWSu9iioWIuUrQIU6zRGCU8XcSADi/hd3rJwSc3RU3dNujIKlMEkuIp421Nv5JWJva4zHCwl4SXrsMXaOpMRv06A5qCMro4TXQZEMDOmaEBaaVHkHJloKa+F65+XdEUkh/yD6VKRLkYrvs7vVSBIrmo9+yJ7WAb5a1KvJNa1kUE2/HrHxDlp/L7f5077lyCfOPDfn/G6CDMbldtZimYJe3pQxjOEzMO3TAi+antHyUxOo9NtgSYdIsCJXplXRGj3bxBd8OkC27uOnrkYhPX6lrlQAu/jxde+cD0lLqqUOLkfUQOBwIikt0rAwjp16uqFzS7Z0ahQ2RBQW3hbd5YfE4AT20lJ7dWAJlbHsxzPLNgHB82wVwoJKt0CuBFeEqZ3izD0F2uCxQ8r0+a1UtDm1+bMajZHxeEx71ncxyk8LnsEGE/XPOpqae4vy4tHxJMKaReGnc6u2qZX4+Xf6A2vCKnWyI194MQAvEpmnKANZIpuGk6fA5MiKk+h8mINecIYZKVsULi8ppGTJYneB7rgCM0bIFKXQ3fbtTH5bEfTLndJCXJ0mX8njh7SykyRQ9dtie4UkXYU6IcAMgGoj2RBM+AjKSNQQ0Q6BWTzGlYxITBpwoCPvjZKbRnCwBWxZ292pIK9UpUmC2/gMaR6m0TNpSTyfkuxZRKFqWdzKGkaOPsAxzlkgrkKnzkabUNW7csb3YoNCe/Q2RbhKmiclE2BgRjI83x+GCJJASb7EjbOvPl6YMEHUZxXHQ9C9RX0wCwtakmOkR3cUamGgN9pQT9c5p955GpSCbVJ2I6cdvcd/zhT6KCHX8pflOMhxcd+gfMztla0utcSTtyFoBADh+UHnmOQypgewu7nNDiV6I07aEg5AuH8qv0qDnudh2VmSlDh32c2mQWt 0jRH79j1 lqJO9OsqYs+LdMBFI+d3TQqqSDTZMFjgY3HtlMXG9lLb7RnowkYKU6x09jUdyuDRk/3+Mj3maBXyoz+WUp5ydFxglAqZsoXGJ7/y+CmQy7owisQUfxO8jYz4Vvuo3lbRcrHJkCEZBr+SyqJe7uh6suUl20CT0V0ojPuPne8S7JHxMyEWlXLIjdIt+fg5eUQ6EiNkgAJEocU7CaaCFvlCmHTaE1cnD4G/R99puun4HOHG3LmLeBlTZ5C4OPK6YaPVqOKvzWOWI0ghBf8yNtWB7adGcQ7VjpQQCiW27b0RLJ8jJYZLth34VqLN5Cv9e3DKOPlxeA+hJ3Mt25xbVl8tMackhf/M4svyMhIrfSx/mijcKN7Ici1bhieJ+6YzlO0nE87nsEgEmUk8vZsbdB7AlCWBRdkJucewpf7X5xCnMT42jeOvTuUkQHSx+iIZR0JVy7DIighmC0Qczw1vGgdVuqcH4nhVVVmkyWTMZtNLoNur8a+8r9fF2ZpIY9p8qSxmcwYUhEGjrOgbYuY+M360aiPQyo0iKuDfqMjI7GJThYHAjUAztdlNgt8j6hQsdg4i+zPCckZ5nEWT+iRjtQZh7l4C2ZQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Agreed with you that with many ranges, the probability of cross-invalidation during sequential hmm_range_fault() calls increases, and in a extreme scenario this could lead to excessive retries. I had been focused on proving correctness and missed the scalability. I propose the further plan: Will add a retry limit similar to what DRM GPU SVM does with DRM_GPUSVM_MAX_RETRIES. This bounds the worst case. This maybe ok to make the current batch userptr usable. And I agree that teaching walk_page_range() to handle non-contiguous VA sets in a single walk would be the proper long-term solution. That work would benefit not only KFD batch userptr. Will keep digging out the better solution. Regards, Honglei On 2026/2/9 23:07, Christian König wrote: > On 2/9/26 15:44, Honglei Huang wrote: >> you said that DRM GPU SVM has the same pattern, but argued >> that it is not designed for "batch userptr". However, this distinction >> has no technical significance. The core problem is "multiple ranges >> under one wide notifier doing per-range hmm_range_fault". Whether >> these ranges are dynamically created by GPU page faults or >> batch-specified via ioctl, the concurrency safety mechanism is >> same. >> >> You said "each hmm_range_fault() can invalidate the other ranges >> while faulting them in". Yes, this can happen but this is precisely >> the scenario that mem->invalid catches: >> >>   1. hmm_range_fault(A) succeeds >>   2. hmm_range_fault(B) triggers reclaim → A's pages swapped out >>      → MMU notifier callback: >>        mutex_lock(notifier_lock) >>          range_A->valid = false >>          mem->invalid++ >>        mutex_unlock(notifier_lock) >>   3. hmm_range_fault(B) completes >>   4. Commit phase: >>        mutex_lock(notifier_lock) >>          mem->invalid != saved_invalid >>          → return -EAGAIN, retry entire batch >>        mutex_unlock(notifier_lock) >> >>  invalid pages are never committed. > > Once more that is not the problem. I completely agree that this is all correctly handled. > > The problem is that the more hmm_ranges you get the more likely it is that getting another pfn invalidates a pfn you previously acquired. > > So this can end up in an endless loop, and that's why the GPUSVM code also has a timeout on the retry. > > > What you need to figure out is how to teach hmm_range_fault() and the underlying walk_page_range() how to skip entries which you are not interested in. > > Just a trivial example, assuming you have the following VAs you want your userptr to be filled in with: 3, 1, 5, 8, 7, 2 > > To handle this case you need to build a data structure which tells you what is the smalest, largest and where each VA in the middle comes in. So you need something like: 1->1, 2->5, 3->0, 5->2, 7->4, 8->3 > > Then you would call walk_page_range(mm, 1, 8, ops, data), the pud walk decides if it needs to go into pmd or eventually fault, the pmd walk decides if ptes needs to be filled in etc... > > The final pte handler then fills in the pfns linearly for the addresses you need. > > And yeah I perfectly know that this is horrible complicated, but as far as I can see everything else will just not scale. > > Creating hundreds of separate userptrs only scales up to a few megabyte and then falls apart. > > Regards, > Christian. > >> >> Regards, >> Honglei >> >> >> On 2026/2/9 22:25, Christian König wrote: >>> On 2/9/26 15:16, Honglei Huang wrote: >>>> The case you described: one hmm_range_fault() invalidating another's >>>> seq under the same notifier, is already handled in the implementation. >>>> >>>>   example: suppose ranges A, B, C share one notifier: >>>> >>>>    1. hmm_range_fault(A) succeeds, seq_A recorded >>>>    2. External invalidation occurs, triggers callback: >>>>       mutex_lock(notifier_lock) >>>>         → mmu_interval_set_seq() >>>>         → range_A->valid = false >>>>         → mem->invalid++ >>>>       mutex_unlock(notifier_lock) >>>>    3. hmm_range_fault(B) succeeds >>>>    4. Commit phase: >>>>       mutex_lock(notifier_lock) >>>>         → check mem->invalid != saved_invalid >>>>         → return -EAGAIN, retry the entire batch >>>>       mutex_unlock(notifier_lock) >>>> >>>> All concurrent invalidations are caught by the mem->invalid counter. >>>> Additionally, amdgpu_ttm_tt_get_user_pages_done() in confirm_valid_user_pages_locked >>>> performs a per-range mmu_interval_read_retry() as a final safety check. >>>> >>>> DRM GPU SVM uses the same approach: drm_gpusvm_get_pages() also calls >>>> hmm_range_fault() per-range independently there is no array version >>>> of hmm_range_fault in DRM GPU SVM either. If you consider this approach >>>> unworkable, then DRM GPU SVM would be unworkable too, yet it has been >>>> accepted upstream. >>>> >>>> The number of batch ranges is controllable. And even if it >>>> scales to thousands, DRM GPU SVM faces exactly the same situation: >>>> it does not need an array version of hmm_range_fault either, which >>>> shows this is a correctness question, not a performance one. For >>>> correctness, I believe DRM GPU SVM already demonstrates the approach >>>> is ok. >>> >>> Well yes, GPU SVM would have exactly the same problems. But that also doesn't have a create bulk userptr interface. >>> >>> The implementation is simply not made for this use case, and as far as I know no current upstream implementation is. >>> >>>> For performance, I have tested with thousands of ranges present: >>>> performance reaches 80%~95% of the native driver, and all OpenCL >>>> and ROCr test suites pass with no correctness issues. >>> >>> Testing can only falsify a system and not verify it. >>> >>>> Here is how DRM GPU SVM handles correctness with multiple ranges >>>> under one wide notifier doing per-range hmm_range_fault: >>>> >>>>    Invalidation: drm_gpusvm_notifier_invalidate() >>>>      - Acquires notifier_lock >>>>      - Calls mmu_interval_set_seq() >>>>      - Iterates affected ranges via driver callback (xe_svm_invalidate) >>>>      - Clears has_dma_mapping = false for each affected range (under lock) >>>>      - Releases notifier_lock >>>> >>>>    Fault: drm_gpusvm_get_pages()  (called per-range independently) >>>>      - mmu_interval_read_begin() to get seq >>>>      - hmm_range_fault() outside lock >>>>      - Acquires notifier_lock >>>>      - mmu_interval_read_retry() → if stale, release lock and retry >>>>      - DMA map pages + set has_dma_mapping = true (under lock) >>>>      - Releases notifier_lock >>>> >>>>    Validation: drm_gpusvm_pages_valid() >>>>      - Checks has_dma_mapping flag (under lock), NOT seq >>>> >>>> If invalidation occurs between two per-range faults, the flag is >>>> cleared under lock, and either mmu_interval_read_retry catches it >>>> in the current fault, or drm_gpusvm_pages_valid() catches it at >>>> validation time. No stale pages are ever committed. >>>> >>>> KFD batch userptr uses the same three-step pattern: >>>> >>>>    Invalidation: amdgpu_amdkfd_evict_userptr_batch() >>>>      - Acquires notifier_lock >>>>      - Calls mmu_interval_set_seq() >>>>      - Iterates affected ranges via interval_tree >>>>      - Sets range->valid = false for each affected range (under lock) >>>>      - Increments mem->invalid (under lock) >>>>      - Releases notifier_lock >>>> >>>>    Fault: update_invalid_user_pages() >>>>      - Per-range hmm_range_fault() outside lock >>> >>> And here the idea falls apart. Each hmm_range_fault() can invalidate the other ranges while faulting them in. >>> >>> That is not fundamentally solveable, but by moving the handling further into hmm_range_fault it makes it much less likely that something goes wrong. >>> >>> So once more as long as this still uses this hacky approach I will clearly reject this implementation. >>> >>> Regards, >>> Christian. >>> >>>>      - Acquires notifier_lock >>>>      - Checks mem->invalid != saved_invalid → if changed, -EAGAIN retry >>>>      - Sets range->valid = true for faulted ranges (under lock) >>>>      - Releases notifier_lock >>>> >>>>    Validation: valid_user_pages_batch() >>>>      - Checks range->valid flag >>>>      - Calls amdgpu_ttm_tt_get_user_pages_done() (mmu_interval_read_retry) >>>> >>>> The logic is equivalent as far as I can see. >>>> >>>> Regards, >>>> Honglei >>>> >>>> >>>> >>>> On 2026/2/9 21:27, Christian König wrote: >>>>> On 2/9/26 14:11, Honglei Huang wrote: >>>>>> >>>>>> So the drm svm is also a NAK? >>>>>> >>>>>> These codes have passed local testing, opencl and rocr, I also provided a detailed code path and analysis. >>>>>> You only said the conclusion without providing any reasons or evidence. Your statement has no justifiable reasons and is difficult to convince >>>>>> so far. >>>>> >>>>> That sounds like you don't understand what the issue here is, I will try to explain this once more on pseudo-code. >>>>> >>>>> Page tables are updated without holding a lock, so when you want to grab physical addresses from the then you need to use an opportunistically retry based approach to make sure that the data you got is still valid. >>>>> >>>>> In other words something like this here is needed: >>>>> >>>>> retry: >>>>>      hmm_range.notifier_seq = mmu_interval_read_begin(notifier); >>>>>      hmm_range.hmm_pfns = kvmalloc_array(npages, ...); >>>>> ... >>>>>      while (true) { >>>>>          mmap_read_lock(mm); >>>>>          err = hmm_range_fault(&hmm_range); >>>>>          mmap_read_unlock(mm); >>>>> >>>>>          if (err == -EBUSY) { >>>>>              if (time_after(jiffies, timeout)) >>>>>                  break; >>>>> >>>>>              hmm_range.notifier_seq = >>>>>                  mmu_interval_read_begin(notifier); >>>>>              continue; >>>>>          } >>>>>          break; >>>>>      } >>>>> ... >>>>>      for (i = 0, j = 0; i < npages; ++j) { >>>>> ... >>>>>          dma_map_page(...) >>>>> ... >>>>>      grab_notifier_lock(); >>>>>      if (mmu_interval_read_retry(notifier, hmm_range.notifier_seq)) >>>>>          goto retry; >>>>>      restart_queues(); >>>>>      drop_notifier_lock(); >>>>> ... >>>>> >>>>> Now hmm_range.notifier_seq indicates if your DMA addresses are still valid or not after you grabbed the notifier lock. >>>>> >>>>> The problem is that hmm_range works only on a single range/sequence combination, so when you do multiple calls to hmm_range_fault() for scattered VA is can easily be that one call invalidates the ranges of another call. >>>>> >>>>> So as long as you only have a few hundred hmm_ranges for your userptrs that kind of works, but it doesn't scale up into the thousands of different VA addresses you get for scattered handling. >>>>> >>>>> That's why hmm_range_fault needs to be modified to handle an array of VA addresses instead of just a A..B range. >>>>> >>>>> Regards, >>>>> Christian. >>>>> >>>>> >>>>>> >>>>>> On 2026/2/9 20:59, Christian König wrote: >>>>>>> On 2/9/26 13:52, Honglei Huang wrote: >>>>>>>> DRM GPU SVM does use hmm_range_fault(), see drm_gpusvm_get_pages() >>>>>>> >>>>>>> I'm not sure what you are talking about, drm_gpusvm_get_pages() only supports a single range as well and not scatter gather of VA addresses. >>>>>>> >>>>>>> As far as I can see that doesn't help the slightest. >>>>>>> >>>>>>>> My implementation follows the same pattern. The detailed comparison >>>>>>>> of invalidation path was provided in the second half of my previous mail. >>>>>>> >>>>>>> Yeah and as I said that is not very valuable because it doesn't solves the sequence problem. >>>>>>> >>>>>>> As far as I can see the approach you try here is a clear NAK from my side. >>>>>>> >>>>>>> Regards, >>>>>>> Christian. >>>>>>> >>>>>>>> >>>>>>>> On 2026/2/9 18:16, Christian König wrote: >>>>>>>>> On 2/9/26 07:14, Honglei Huang wrote: >>>>>>>>>> >>>>>>>>>> I've reworked the implementation in v4. The fix is actually inspired >>>>>>>>>> by the DRM GPU SVM framework (drivers/gpu/drm/drm_gpusvm.c). >>>>>>>>>> >>>>>>>>>> DRM GPU SVM uses wide notifiers (recommended 512M or larger) to track >>>>>>>>>> multiple user virtual address ranges under a single mmu_interval_notifier, >>>>>>>>>> and these ranges can be non-contiguous which is essentially the same >>>>>>>>>> problem that batch userptr needs to solve: one BO backed by multiple >>>>>>>>>> non-contiguous CPU VA ranges sharing one notifier. >>>>>>>>> >>>>>>>>> That still doesn't solve the sequencing problem. >>>>>>>>> >>>>>>>>> As far as I can see you can't use hmm_range_fault with this approach or it would just not be very valuable. >>>>>>>>> >>>>>>>>> So how should that work with your patch set? >>>>>>>>> >>>>>>>>> Regards, >>>>>>>>> Christian. >>>>>>>>> >>>>>>>>>> >>>>>>>>>> The wide notifier is created in drm_gpusvm_notifier_alloc: >>>>>>>>>>       notifier->itree.start = ALIGN_DOWN(fault_addr, gpusvm->notifier_size); >>>>>>>>>>       notifier->itree.last = ALIGN(fault_addr + 1, gpusvm->notifier_size) - 1; >>>>>>>>>> The Xe driver passes >>>>>>>>>>       xe_modparam.svm_notifier_size * SZ_1M in xe_svm_init >>>>>>>>>> as the notifier_size, so one notifier can cover many of MB of VA space >>>>>>>>>> containing multiple non-contiguous ranges. >>>>>>>>>> >>>>>>>>>> And DRM GPU SVM solves the per-range validity problem with flag-based >>>>>>>>>> validation instead of seq-based validation in: >>>>>>>>>>       - drm_gpusvm_pages_valid() checks >>>>>>>>>>           flags.has_dma_mapping >>>>>>>>>>         not notifier_seq. The comment explicitly states: >>>>>>>>>>           "This is akin to a notifier seqno check in the HMM documentation >>>>>>>>>>            but due to wider notifiers (i.e., notifiers which span multiple >>>>>>>>>>            ranges) this function is required for finer grained checking" >>>>>>>>>>       - __drm_gpusvm_unmap_pages() clears >>>>>>>>>>           flags.has_dma_mapping = false  under notifier_lock >>>>>>>>>>       - drm_gpusvm_get_pages() sets >>>>>>>>>>           flags.has_dma_mapping = true  under notifier_lock >>>>>>>>>> I adopted the same approach. >>>>>>>>>> >>>>>>>>>> DRM GPU SVM: >>>>>>>>>>       drm_gpusvm_notifier_invalidate() >>>>>>>>>>         down_write(&gpusvm->notifier_lock); >>>>>>>>>>         mmu_interval_set_seq(mni, cur_seq); >>>>>>>>>>         gpusvm->ops->invalidate() >>>>>>>>>>           -> xe_svm_invalidate() >>>>>>>>>>              drm_gpusvm_for_each_range() >>>>>>>>>>                -> __drm_gpusvm_unmap_pages() >>>>>>>>>>                   WRITE_ONCE(flags.has_dma_mapping = false);  // clear flag >>>>>>>>>>         up_write(&gpusvm->notifier_lock); >>>>>>>>>> >>>>>>>>>> KFD batch userptr: >>>>>>>>>>       amdgpu_amdkfd_evict_userptr_batch() >>>>>>>>>>         mutex_lock(&process_info->notifier_lock); >>>>>>>>>>         mmu_interval_set_seq(mni, cur_seq); >>>>>>>>>>         discard_invalid_ranges() >>>>>>>>>>           interval_tree_iter_first/next() >>>>>>>>>>             range_info->valid = false;          // clear flag >>>>>>>>>>         mutex_unlock(&process_info->notifier_lock); >>>>>>>>>> >>>>>>>>>> Both implementations: >>>>>>>>>>       - Acquire notifier_lock FIRST, before any flag changes >>>>>>>>>>       - Call mmu_interval_set_seq() under the lock >>>>>>>>>>       - Use interval tree to find affected ranges within the wide notifier >>>>>>>>>>       - Mark per-range flag as invalid/valid under the lock >>>>>>>>>> >>>>>>>>>> The page fault path and final validation path also follow the same >>>>>>>>>> pattern as DRM GPU SVM: fault outside the lock, set/check per-range >>>>>>>>>> flag under the lock. >>>>>>>>>> >>>>>>>>>> Regards, >>>>>>>>>> Honglei >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On 2026/2/6 21:56, Christian König wrote: >>>>>>>>>>> On 2/6/26 07:25, Honglei Huang wrote: >>>>>>>>>>>> From: Honglei Huang >>>>>>>>>>>> >>>>>>>>>>>> Hi all, >>>>>>>>>>>> >>>>>>>>>>>> This is v3 of the patch series to support allocating multiple non-contiguous >>>>>>>>>>>> CPU virtual address ranges that map to a single contiguous GPU virtual address. >>>>>>>>>>>> >>>>>>>>>>>> v3: >>>>>>>>>>>> 1. No new ioctl: Reuses existing AMDKFD_IOC_ALLOC_MEMORY_OF_GPU >>>>>>>>>>>>         - Adds only one flag: KFD_IOC_ALLOC_MEM_FLAGS_USERPTR_BATCH >>>>>>>>>>> >>>>>>>>>>> That is most likely not the best approach, but Felix or Philip need to comment here since I don't know such IOCTLs well either. >>>>>>>>>>> >>>>>>>>>>>>         - When flag is set, mmap_offset field points to range array >>>>>>>>>>>>         - Minimal API surface change >>>>>>>>>>> >>>>>>>>>>> Why range of VA space for each entry? >>>>>>>>>>> >>>>>>>>>>>> 2. Improved MMU notifier handling: >>>>>>>>>>>>         - Single mmu_interval_notifier covering the VA span [va_min, va_max] >>>>>>>>>>>>         - Interval tree for efficient lookup of affected ranges during invalidation >>>>>>>>>>>>         - Avoids per-range notifier overhead mentioned in v2 review >>>>>>>>>>> >>>>>>>>>>> That won't work unless you also modify hmm_range_fault() to take multiple VA addresses (or ranges) at the same time. >>>>>>>>>>> >>>>>>>>>>> The problem is that we must rely on hmm_range.notifier_seq to detect changes to the page tables in question, but that in turn works only if you have one hmm_range structure and not multiple. >>>>>>>>>>> >>>>>>>>>>> What might work is doing an XOR or CRC over all hmm_range.notifier_seq you have, but that is a bit flaky. >>>>>>>>>>> >>>>>>>>>>> Regards, >>>>>>>>>>> Christian. >>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> 3. Better code organization: Split into 8 focused patches for easier review >>>>>>>>>>>> >>>>>>>>>>>> v2: >>>>>>>>>>>>         - Each CPU VA range gets its own mmu_interval_notifier for invalidation >>>>>>>>>>>>         - All ranges validated together and mapped to contiguous GPU VA >>>>>>>>>>>>         - Single kgd_mem object with array of user_range_info structures >>>>>>>>>>>>         - Unified eviction/restore path for all ranges in a batch >>>>>>>>>>>> >>>>>>>>>>>> Current Implementation Approach >>>>>>>>>>>> =============================== >>>>>>>>>>>> >>>>>>>>>>>> This series implements a practical solution within existing kernel constraints: >>>>>>>>>>>> >>>>>>>>>>>> 1. Single MMU notifier for VA span: Register one notifier covering the >>>>>>>>>>>>         entire range from lowest to highest address in the batch >>>>>>>>>>>> >>>>>>>>>>>> 2. Interval tree filtering: Use interval tree to efficiently identify >>>>>>>>>>>>         which specific ranges are affected during invalidation callbacks, >>>>>>>>>>>>         avoiding unnecessary processing for unrelated address changes >>>>>>>>>>>> >>>>>>>>>>>> 3. Unified eviction/restore: All ranges in a batch share eviction and >>>>>>>>>>>>         restore paths, maintaining consistency with existing userptr handling >>>>>>>>>>>> >>>>>>>>>>>> Patch Series Overview >>>>>>>>>>>> ===================== >>>>>>>>>>>> >>>>>>>>>>>> Patch 1/8: Add userptr batch allocation UAPI structures >>>>>>>>>>>>          - KFD_IOC_ALLOC_MEM_FLAGS_USERPTR_BATCH flag >>>>>>>>>>>>          - kfd_ioctl_userptr_range and kfd_ioctl_userptr_ranges_data structures >>>>>>>>>>>> >>>>>>>>>>>> Patch 2/8: Add user_range_info infrastructure to kgd_mem >>>>>>>>>>>>          - user_range_info structure for per-range tracking >>>>>>>>>>>>          - Fields for batch allocation in kgd_mem >>>>>>>>>>>> >>>>>>>>>>>> Patch 3/8: Implement interval tree for userptr ranges >>>>>>>>>>>>          - Interval tree for efficient range lookup during invalidation >>>>>>>>>>>>          - mark_invalid_ranges() function >>>>>>>>>>>> >>>>>>>>>>>> Patch 4/8: Add batch MMU notifier support >>>>>>>>>>>>          - Single notifier for entire VA span >>>>>>>>>>>>          - Invalidation callback using interval tree filtering >>>>>>>>>>>> >>>>>>>>>>>> Patch 5/8: Implement batch userptr page management >>>>>>>>>>>>          - get_user_pages_batch() and set_user_pages_batch() >>>>>>>>>>>>          - Per-range page array management >>>>>>>>>>>> >>>>>>>>>>>> Patch 6/8: Add batch allocation function and export API >>>>>>>>>>>>          - init_user_pages_batch() main initialization >>>>>>>>>>>>          - amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu_batch() entry point >>>>>>>>>>>> >>>>>>>>>>>> Patch 7/8: Unify userptr cleanup and update paths >>>>>>>>>>>>          - Shared eviction/restore handling for batch allocations >>>>>>>>>>>>          - Integration with existing userptr validation flows >>>>>>>>>>>> >>>>>>>>>>>> Patch 8/8: Wire up batch allocation in ioctl handler >>>>>>>>>>>>          - Input validation and range array parsing >>>>>>>>>>>>          - Integration with existing alloc_memory_of_gpu path >>>>>>>>>>>> >>>>>>>>>>>> Testing >>>>>>>>>>>> ======= >>>>>>>>>>>> >>>>>>>>>>>> - Multiple scattered malloc() allocations (2-4000+ ranges) >>>>>>>>>>>> - Various allocation sizes (4KB to 1G+ per range) >>>>>>>>>>>> - Memory pressure scenarios and eviction/restore cycles >>>>>>>>>>>> - OpenCL CTS and HIP catch tests in KVM guest environment >>>>>>>>>>>> - AI workloads: Stable Diffusion, ComfyUI in virtualized environments >>>>>>>>>>>> - Small LLM inference (3B-7B models) >>>>>>>>>>>> - Benchmark score: 160,000 - 190,000 (80%-95% of bare metal) >>>>>>>>>>>> - Performance improvement: 2x-2.4x faster than userspace approach >>>>>>>>>>>> >>>>>>>>>>>> Thank you for your review and feedback. >>>>>>>>>>>> >>>>>>>>>>>> Best regards, >>>>>>>>>>>> Honglei Huang >>>>>>>>>>>> >>>>>>>>>>>> Honglei Huang (8): >>>>>>>>>>>>        drm/amdkfd: Add userptr batch allocation UAPI structures >>>>>>>>>>>>        drm/amdkfd: Add user_range_info infrastructure to kgd_mem >>>>>>>>>>>>        drm/amdkfd: Implement interval tree for userptr ranges >>>>>>>>>>>>        drm/amdkfd: Add batch MMU notifier support >>>>>>>>>>>>        drm/amdkfd: Implement batch userptr page management >>>>>>>>>>>>        drm/amdkfd: Add batch allocation function and export API >>>>>>>>>>>>        drm/amdkfd: Unify userptr cleanup and update paths >>>>>>>>>>>>        drm/amdkfd: Wire up batch allocation in ioctl handler >>>>>>>>>>>> >>>>>>>>>>>>       drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h    |  23 + >>>>>>>>>>>>       .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c  | 539 +++++++++++++++++- >>>>>>>>>>>>       drivers/gpu/drm/amd/amdkfd/kfd_chardev.c      | 128 ++++- >>>>>>>>>>>>       include/uapi/linux/kfd_ioctl.h                |  31 +- >>>>>>>>>>>>       4 files changed, 697 insertions(+), 24 deletions(-) >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >