From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C1C2AC369D1 for ; Mon, 28 Apr 2025 01:08:23 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A15816B0005; Sun, 27 Apr 2025 21:08:21 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 99EF86B0006; Sun, 27 Apr 2025 21:08:21 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 779A26B0007; Sun, 27 Apr 2025 21:08:21 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 463E16B0005 for ; Sun, 27 Apr 2025 21:08:21 -0400 (EDT) Received: from smtpin29.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 4C3BC1A093D for ; Mon, 28 Apr 2025 01:08:22 +0000 (UTC) X-FDA: 83381666844.29.8ACB115 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.19]) by imf08.hostedemail.com (Postfix) with ESMTP id 7C7ED160003 for ; Mon, 28 Apr 2025 01:08:17 +0000 (UTC) Authentication-Results: imf08.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=BRGymafH; dmarc=pass (policy=none) header.from=intel.com; arc=reject ("signature check failed: fail, {[1] = sig:microsoft.com:reject}"); spf=pass (imf08.hostedemail.com: domain of yan.y.zhao@intel.com designates 192.198.163.19 as permitted sender) smtp.mailfrom=yan.y.zhao@intel.com ARC-Seal: i=2; s=arc-20220608; d=hostedemail.com; t=1745802498; a=rsa-sha256; cv=fail; b=hzsaZWb5TCplmdolvzh7okCfqvPIpl5DopCOupunqfhb8oT6LEEnyYTdg7A10V8OVEvW+r Yowt7IsX66obQf/MWNB1dWiSMGVymdo65s5C1xQvh0+fV7X97Rfoi+b+65tbK+E7klOu1S cSYY174YkAZdi/Dfn5V38BodaC+84kE= ARC-Authentication-Results: i=2; imf08.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=BRGymafH; dmarc=pass (policy=none) header.from=intel.com; arc=reject ("signature check failed: fail, {[1] = sig:microsoft.com:reject}"); spf=pass (imf08.hostedemail.com: domain of yan.y.zhao@intel.com designates 192.198.163.19 as permitted sender) smtp.mailfrom=yan.y.zhao@intel.com ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1745802498; h=from:from:sender:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=K8Y3H/dmF+99qMtWOiSsTuW84Lww9d5DpZG3tcGJO/U=; b=i7m0tqzgFPXBgWGGkMFIK1Jh2UcmQxrFqTM5+KF3zDDMUL0efeGJHpcanrP8fBLo/4nrWr G1h0oGsLjwIeVPQ/ShTSKVTnadTTfNM7TJA2+2i/7Aw0go2e+xrl+Koh+3apfdtEqb4ZjM m/aGR6GZNaWa340sUwJP19jQ8mEs9Y8= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1745802498; x=1777338498; h=date:from:to:cc:subject:message-id:reply-to:references: content-transfer-encoding:in-reply-to:mime-version; bh=3bkPO/C42PLyZ7hVYY/Zrj9KBsfM2+S3SLOnMFK0A4s=; b=BRGymafHT2NMMNbPkWAdlpV/er9t6mQRCc663kmHN6+t0k9xBU9TcqFk z4DvqHOgudBGGeeSB4ezyoQMVYaDYMKE1ptws+fjlTxpiRIhOyy+pQLTp lXHqpd5uouyUeGrm5hisRcpeiwYbM+LYqPjAGNyTH/zQAhtPzYXAUQ2c8 ZT5VoZPk/UPfZKOnDNoh78yCDHdrpZgy/yjbwKmzhEA0bi04pGQt96H5k RHglToysZ9KwhXtjjV2PoOqjg8wAnGqmGbMhO5zI0wUCNXIUbIhK+24jK 2Jc8xdaZLuZxa9AVl58k64/8yqTf/Pk+aOjUFDi+4HHt8UAS9GIJIOsKp w==; X-CSE-ConnectionGUID: bkVpZ5Z3TlmDbBBW3qtyiQ== X-CSE-MsgGUID: 5H0LAkGsTsGDZ/NII7YoAQ== X-IronPort-AV: E=McAfee;i="6700,10204,11416"; a="46499589" X-IronPort-AV: E=Sophos;i="6.15,245,1739865600"; d="scan'208";a="46499589" Received: from orviesa001.jf.intel.com ([10.64.159.141]) by fmvoesa113.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Apr 2025 18:08:15 -0700 X-CSE-ConnectionGUID: FYYFVoyXREe3HP0p4MDoHg== X-CSE-MsgGUID: 10rdBWxYREWp0qVnOChJVg== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.15,245,1739865600"; d="scan'208";a="170594771" Received: from orsmsx903.amr.corp.intel.com ([10.22.229.25]) by orviesa001.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Apr 2025 18:08:15 -0700 Received: from ORSMSX901.amr.corp.intel.com (10.22.229.23) by ORSMSX903.amr.corp.intel.com (10.22.229.25) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.14; Sun, 27 Apr 2025 18:08:14 -0700 Received: from orsedg603.ED.cps.intel.com (10.7.248.4) by ORSMSX901.amr.corp.intel.com (10.22.229.23) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.14 via Frontend Transport; Sun, 27 Apr 2025 18:08:14 -0700 Received: from NAM02-DM3-obe.outbound.protection.outlook.com (104.47.56.46) by edgegateway.intel.com (134.134.137.100) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.1.2507.44; Sun, 27 Apr 2025 18:08:14 -0700 ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=TSen+Yhro1igoCjTGcfD+HU4Kw3uirzOmvDI1ALtxAob7fFr5c4+cYvAfwolrbFvNCsw78zefhU14FOmRwchDDhqmXUlR4xQ0Dw6X9uyHYeXzp2mvMgo4pVwPVcgXYHVrVYEmRlO0CsuMQkvfZppZI2D6N7Z1ZpDPEq05InENduAM8gJGmBKWAzj9AeOeOpKr5TwcZzM78dR+H9yRgmmmPBqWjjeG07m9AQQfuVj08NWXyeIPZTbWL/9gppfsUHTZFjOXRJATKgOlysiWiuL9LDRkVnfu65BFrK7BCqFQVpyfBEHw1CK2agwgHHbnNKZ9tnzE0wmk+6GxbTX4/vbfg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=K8Y3H/dmF+99qMtWOiSsTuW84Lww9d5DpZG3tcGJO/U=; b=COnHKrp6R+qtqSxiHMTFb/qYOolFVY2XIuLBB+dOhsLopH2LZt8wBdrj9nVzrClUp2LVv6S4aQUF1+GHgNbwSTWo4LAE7nuBT7r74Idb5y8bYjPelCURcJpiON7hMtj0vrxe8WX5x32eNw2LiJ4Cj2UzznIHEQgpN8Ig7nh05sCrV2usQqwjiWgDk+NOmHAeUVRy5X1TubvMtEVpvnjPJPETbqa/0CCN5bDeV1Jh30Pi0duHYCWNrdt1kGjgIH7mkfd5nsICWCsUbOWCMFU8cMcNM2hwXYETrCfPAKvsOJdtzCxNOxEnabKWv8IJuGh7YVSB/EFkTbmFulvbLSgvJA== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=intel.com; dmarc=pass action=none header.from=intel.com; dkim=pass header.d=intel.com; arc=none Received: from DS7PR11MB5966.namprd11.prod.outlook.com (2603:10b6:8:71::6) by SN7PR11MB6601.namprd11.prod.outlook.com (2603:10b6:806:273::18) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.8678.31; Mon, 28 Apr 2025 01:07:39 +0000 Received: from DS7PR11MB5966.namprd11.prod.outlook.com ([fe80::e971:d8f4:66c4:12ca]) by DS7PR11MB5966.namprd11.prod.outlook.com ([fe80::e971:d8f4:66c4:12ca%6]) with mapi id 15.20.8678.028; Mon, 28 Apr 2025 01:07:38 +0000 Date: Mon, 28 Apr 2025 09:05:32 +0800 From: Yan Zhao To: Ackerley Tng CC: Vishal Annapurve , Chenyi Qiang , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , Subject: Re: [RFC PATCH 39/39] KVM: guest_memfd: Dynamically split/reconstruct HugeTLB page Message-ID: Reply-To: Yan Zhao References: Content-Type: text/plain; charset="utf-8" Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: X-ClientProxiedBy: KL1PR0401CA0007.apcprd04.prod.outlook.com (2603:1096:820:f::12) To DS7PR11MB5966.namprd11.prod.outlook.com (2603:10b6:8:71::6) MIME-Version: 1.0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: DS7PR11MB5966:EE_|SN7PR11MB6601:EE_ X-MS-Office365-Filtering-Correlation-Id: 040e794b-22d5-431e-13a5-08dd85f11191 X-LD-Processed: 46c98d88-e344-4ed4-8496-4ed7712e255d,ExtAddr X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0;ARA:13230040|366016|7416014|1800799024|376014; X-Microsoft-Antispam-Message-Info: =?utf-8?B?VTd2c0ZWRjA4TStpeWVKM05md2NFcGNMT0laa29KN3JuUXlMMUpTK05BakYr?= =?utf-8?B?QjhsTkdaZjBlNnNZVTlhMUs5OXdOdzhTTm9OV1Y3WnI5Tkc3Mk9EdW5Nb3pw?= =?utf-8?B?bG52TzkwZGNLQTNoWW9NZithZm0rMTNRN2IzYUxiblJ3TngyNVBDZzFIS0xH?= =?utf-8?B?dVNOaFIwYlp0dGE2cnU0anZXNGp1bTdoWlc0TC9lWCtHenZPRXdLV3VHOElP?= =?utf-8?B?L1hMZWUyZCtmNHNHNEVMWnUweVJTTXVqcjNwNWNKc3VTZUVGVUt5SnZSeVNz?= =?utf-8?B?RzdndVNHck5MMU55OG5MdC9oWXpJcUVsTmg4Z2lEdmFsRHlpejVwR202bDFw?= =?utf-8?B?SXdkSDJlRVFiQTlYdldtaDhLWTQ5OVhHVXh6eE50SHgwclVNWVFMd2ZPeFN5?= =?utf-8?B?eStpTWE0QUpnYWgyNmlKVEt5SDFXNEx6VDVGTDBoWTZBRmx5Z2EwYlcrUEFF?= =?utf-8?B?dy9wVXVBL2NOQVdBcWZ5bXFFWFduSThVK2J5aXU1SG5tZzNhc2xOVmt6b096?= =?utf-8?B?WnV6NnNkMUJpbkdvWkt6L0FKbUd1UGJzb3NKSDdtcjhrcitsNzhQR0ZUSmNN?= =?utf-8?B?am5nNWxpd0ZyQ1dsZEZnczZDTURhd2RWSUhyTXRKWGE0aDB4Mnh3b1MxUjFZ?= =?utf-8?B?MUN4SXdiWjIveWJWZzlWZUxjSDdIMmVZdVZNaWxLNERuZTNGK05uMHJsUk1M?= =?utf-8?B?UzFiQlhKdnJvNHFEQWFybk8yaGh6VGx1VTFMUjVqenpydWhpZ3R1SUhlS1gx?= =?utf-8?B?NGJyNWtRbUFqTUFZUmxJQjI5WGpIbFJzcFkxNmtBeUhSaWozUGF1UXdOWUJx?= =?utf-8?B?S1R2UEc5SUtxY2w0Uk5vVHRpMG9zV1lqMDhmcHIxY3o1SkRkV0RKUHFTOWN5?= =?utf-8?B?YnF0V2lFSi9mRW9jcjhuZUV5UGNIbFNCQ1V3RU02ZEdIbmZsSGpGRVIwYmM0?= =?utf-8?B?UUxxaEF2Q21hL3Q2MUlMWVQyeVlYUDBIcmw0NEE3cFFWTmo2Z25ySDc5WDlr?= =?utf-8?B?L3BzemJDZSsyU2I1WDRKY3NxQnRVRldKYUdwWjBzbWdDSk1KTXI3SUtGTmg4?= =?utf-8?B?dFFtbk9xcHdrSitjVzRrY1NSb3VISDNQbVFCZXJFNTFabFNzMGQvTUtJbGh3?= =?utf-8?B?cW1xb3NzbDgzQmRZYWZJampxREF3eVAzak9wYStMWCt1MHhkcUhyNDd6OTlS?= =?utf-8?B?N2xhTG5GdlpPaHJzeVBQM3BBRitrM0tJc3ROWVhKZzNPZHRqNlZiamZ3L2NZ?= =?utf-8?B?WFJiVjJ1RVg0V3hNVFlJa2xpS3BkM0J0N0pMUjVUYkJSOGVFLzdpZEN5aGpH?= =?utf-8?B?TzB0OGhYOTJ1bG1obmhGcTZpbGc2dGpOL2oxS1ozZDF3aDhqNURVU0Z6Skcw?= =?utf-8?B?YnFoZkpzcGtEdVVWaEtMUE1PMUU2dXNhK05UZUMxcWVCWDZ3WTRmZnpzbHRX?= =?utf-8?B?czR1VXFKZ3RQSWllemhOQWJIdzZlcllXaFNnV0RacHFYeHVydzBUYjhIbWNi?= =?utf-8?B?OUtkb3hsbjRaVlhHMUNtTlJOWXhLc3ZnZUdMT0krbm1CY2FoZXM2Zlpjd0xD?= =?utf-8?B?bXNDZ09aMFVTejZjTmtPbkdyTEdYOHpGUWZFRm5sbmczTmJXNUFMd2pldWpL?= =?utf-8?B?TmxSaUo3T2t2Wmh0dmkxdGNidTZWYXlRSlFWVzBBODhqWUxwWkVtYmh1eUF4?= =?utf-8?B?UGJXbzNxUWkwUlY5M2E0Vk9RL1huTy92Njh2emlLU0xja3NJK0VjeENLYUpu?= =?utf-8?B?SSt3K2sxMDhiQlNKN0J5WDFpU0t4ZWJ3dDMwcnU0L255anBhOGh5NTFqKy9G?= =?utf-8?Q?4Wt0QX+SHtIXDxPgIj+w7NF20wqJRSAGRz8xk=3D?= X-Forefront-Antispam-Report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:DS7PR11MB5966.namprd11.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230040)(366016)(7416014)(1800799024)(376014);DIR:OUT;SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: =?utf-8?B?VjI0dHhKZWxrb3hQTGdVVm05QUtqdFNmMExQcDl2NEtiZmxLNWxDNnhVRUc0?= =?utf-8?B?a1BwV0xWcDBvUVZ2REZoSnBUMVBSYkt6QnlvaXNnUERmZHhxc2tnQTFGeFNu?= =?utf-8?B?ZzlVQ3lqclFuNU9VcmlhdlRzL2xOWHUyTmJueCs3RWxVTnRFUWhyNWdPVFh2?= =?utf-8?B?UFZHc242YmRScy9zcWI1K1VINjAzZEs4Qkp2czFaS1VhOEZNV2NYYTc1Rzdx?= =?utf-8?B?SGE3LzJZNmZMQ2hGcXVMRkFnZklPNW8xRUJrbmFiTW9pblZNQm1UMzE2R0hl?= =?utf-8?B?eHlJQmlZMU8vR0dsWWQvZjJHMDJJWlQvY3hUVHZDRWoxQkpGVXhwaWxwZmR3?= =?utf-8?B?eWFEdDc0UCtzT1ZiT3lpaWIrSTg4ZFhDRjZYYkxtbkNvV21yZnNCMzV2RER6?= =?utf-8?B?aWd6Y3dETlJLcEh1N21Ka2VtKy9VMUNZVUNwZjYrQmlYLzFjTEpBdzNYQWdS?= =?utf-8?B?cEgwWWZBN2JlK2tRLzVnU2l6TmNnanBURms1NXdKNHlpTFB4aWlibXE1YUZv?= =?utf-8?B?d1QwdVVlTDg4ODJLdm9jR0t1dUZsanVOVm5WN0VXSzJvRENlR29rMWlwcHQv?= =?utf-8?B?Q25nank4MjF6cmRpc2hIRXlFWFJsVnR1SnRVN1BLOGhod2xka3pIRk90Yk4y?= =?utf-8?B?ZTM4K2FVc25veWg4UU1TeVlRdDhFUk1UU0c4T0ZvOE82SDB2WTFDYmtpcUo4?= =?utf-8?B?Nll1bU1XWjRRNFhNSVBSeGJKeS9Ib0duU0xtb0RZOFVWeWMyaC8yY2gvN29j?= =?utf-8?B?SklabDUvUU9mejgwNUd1KzdRT2d5akxPYXdONTIvVzc4M0pNL2pBMUZvMXhO?= =?utf-8?B?SVJ0MTdLVUJLcUJVcHlHQUdNMVJTM2ROYlg1OUl5b1NYWmk0dXlSSEIrMGRB?= =?utf-8?B?NnNNdkpKbXB0OWVFbGE5WjNJbWwra3VGamtrQW9IVkpKa1M0bkptZ2hQSzFP?= =?utf-8?B?L3VSUURnTTU2NnZNNjlnODI5NWpxK094L2VScUxNalpkcTc4M1VWZTFManNS?= =?utf-8?B?T0N0aWhkd3J2WUVEbHJCQ2F2SFhrTHFFVFR1SHRYUURzaVBYOUVoaDRSZmVh?= =?utf-8?B?emRtWFlVMVZTaEsyMmxtbTl6b1VyRFJ0Smdib0hwS0o3RDM4eDZ0SmFQTUY4?= =?utf-8?B?WmNoNHh6R2dWV0dBckcvMHJnRUhnK2NQZHo0emZ4ckx1akgwYWpXYzhaYVpL?= =?utf-8?B?ZldjaUtlMEVZalltMXo5bmEyMkhvY3U5cEZsNXRWejF6K01WOFpwSHQ5S0hq?= =?utf-8?B?dnJBVklZM1Vrcy9LbURuWVU5bHNwY2hPWDV3aStFQnVVeHZBMEprMVBOYXZr?= =?utf-8?B?SWdnS0NrLzlwOG5CcDd4aTg4empKZGpRdzRQU0ZnWEtwVkI0czQrSEU2SHBV?= =?utf-8?B?SHFBMW5rUlV5RGs2MTdlcU9xYWlVa0FaMVA2aFhkRkVUeUNKVUFlcmZ1ZmRy?= =?utf-8?B?UkhZZ25TKzhQT0tOMlNlMUg4V0h6UFRTV0FSUklrdDNJWkFBbTkvclU0WTBm?= =?utf-8?B?blp5TEFqVEtneVZMUEwxa2RFTXk0OXhvelBPa0V1RmxGTVZTbzBUSnB6UzVr?= =?utf-8?B?bU42SG9yZ2Y0UEtwWHlacjB3VGpsd2FFNHNOV1d4OVptemZySWpETThoejl0?= =?utf-8?B?d3prVGpzUDdzcEVJV2xyNUQ2YlVQb0ZuNkx5eXFmM3FNMnliMk0xd1hPVXFr?= =?utf-8?B?dDRYaGE1WkxvTjRnQ0ZOcmZqK2g0SmN6YVIycEt2dzFxai9VcVlEaUZpeUd4?= =?utf-8?B?Zkl1QjM0K0RobFZxMHU0Vklua0xVVWtBSUl2V3A1dGNIQW1TaFF5WmVza0ZJ?= =?utf-8?B?Z29sWWpTUUR5L2FZZGM2b3lvcUdEdTE1WlFtMWdqRElpYm1nTTRZRHQ3VS90?= =?utf-8?B?Kzh1SVRxV3Zjb1k5Unk4MXpEaTdxcjlodERiOG5HR1BING1lWUF2L0xjMFJJ?= =?utf-8?B?Qnl2YzVic0hrZnVRRS94Z2gxNU9RMXc0L2RMcW1sUTQ4NUw4NjQ1V3VzaGdG?= =?utf-8?B?aTNFNU1QVmljYlMwbTRxbzBobDFKVXBIODB5MkcvYVBaOHZjcU5OSEdldHg5?= =?utf-8?B?OGVWdDh6SmsyVTkwa29pSEZtSmtqR284em1wQVJlMTJ5N095YjhrTXBrbFQz?= =?utf-8?Q?wn8ti1/QiWS8Sy3WKNc9zw/9t?= X-MS-Exchange-CrossTenant-Network-Message-Id: 040e794b-22d5-431e-13a5-08dd85f11191 X-MS-Exchange-CrossTenant-AuthSource: DS7PR11MB5966.namprd11.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 28 Apr 2025 01:07:38.7087 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 46c98d88-e344-4ed4-8496-4ed7712e255d X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: EWlRwHwA1tI1a+/oDBDReHfXOKcL9KBM9yz8n2HJw+pmlIWzcoFUsGWrkAeazYUb0av+DXsn8Oa+1lHhuuL6rA== X-MS-Exchange-Transport-CrossTenantHeadersStamped: SN7PR11MB6601 X-OriginatorOrg: intel.com X-Rspam-User: X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: 7C7ED160003 X-Stat-Signature: rsg1978rxxdeamoorkh44pxtx5d3ifjz X-HE-Tag: 1745802497-388144 X-HE-Meta: U2FsdGVkX1/tmvkBy3D2xxRLRquCiLuyUVFvoy7uxhqLnsPx+x/42tieXK09PaW0JhDhdoBhTtTHmhAsLSMT9XPV5PjsoniniccEiTHy715XbJTY7iyAw3KVMwTk26Xgu+XXO0X8cwe+yEE9O2fdy6n8GBSN2enP9qUOQIXKgh44sa47rvh/kvqBOUyMiv+wgUROsNHJA/npXZ/3H/frMb1ob497e9JXqnGwt3APRxzYxjSAJlr5uKeLyOdcWLRtx95We+XzKYRe+lEXUtKInb6s5U7PjCLUcwNVcIAkgtuduO/AydkS/4uhz3CMwi7nt9psqmc2Akdg4UurSlCECF0tQNjDq41iNIF7haqZtfRpjTKzQq6+xw3bvyCIS2Tq0oHlYfm00tO8Q7sx55AWQ4Tmt8iWT8vqFU7ZLtj0hT9tcyp/ULAt3NH7STGKj/drRZWvvZXM8yCmssZh32CLXo5K4JDULYoLwFF6qoAraD9Fd8eOVu2vC8zv1vg0wmfuHy/EGfS0UiceWTacty1J9XusTYcXdytsubLGQ13+UktWpr3bbXlsMm4jTuWyzuyC9zfSrMWUUPx+sgTL67rFvvyoQ+wA65T9H6nDGFmBvks844kvNLxXeKOvT2lxWQ+wa4IGYXcoR9IYISb0zc+hu84CcqeLT2c4YHTo7szwQXaSq+JjeSjEYMzj45ELvote+NMQgYZ6ryDB/UnSEOOHExGdes5W/ofICoG4EjS9Ab8emDleLBEmfhKOkdrt6SuohKdtD/iU0zgKFkFngEKjEEFA00AIp3z8BKcWrrj5kO6oD+TdCyBujPgPDzuIjgVJYTSX5mn/H5E6oUixkDiIfDgAiYqGGClqpSLC5mv285fHnpyUTaWlSoJqwWWj5qc5VnGdWgTggvP6oZ3n9Qi4SQi/3kELwiOLKVe+xzZ4Ig7P8Jrsk7xMT3+5s56cLqVKCMEfz1Zx693oMt0c9lM K4EJ/Kch kujbEZgf5zNxPlB+4r9UfA9qTELR8nRIZsc66qvBx1ZUTP5COSJMaF6MvTFpm+ZAr2ZKpI+82IZBPI9lMq6lOeb375U312wIf+LhMWvZmv8VOwpYhwpQM4b7NOg/1G2awt9ZyYaJJkkeeZLDa6PQV1IfRRjOplddIHoyHB7ywnf7wTu8/mnswJO8x3iKgsHMfotIjUi2sHlMp7NhZHTrUuXHASc3lIZktS8L8QbiiR15ls6n8QAV8ehfrPZyBre5Wk4X2vfWpdfiQrfc1EUOHKaNLBX2RGuQX7vHQikDfdAXwWHUVT1JmZSzMbjh668KFeKYLvjFz7U3K6TbEO6j5gES612bxqO7bVPhZkWmstRcoNWtJMOMhRrRcBAY6C5lXV2Q6eoXvyJdOggQObpu+RCYz6c1ZSVewrvw6m2ZoROIjqGreRKdsDE7wHqiSr2qAAFKk+dRrwpfTC6JHlSEEwdEo80QcJpizTA3odg4r9/5NRj0xtmPs8B30ojkpVOIePBUaBv0jh4OZae8lkkeXYZsign9RKT21srd4mDqXPRSiDf+sS3zHmlii+ltKXIWoWfpGcHqTYocW88Y66bDflBMGPADDiZOPvELLsbni2n9y9D2RQbxhSIXPqfpsNNGzRcrLavCekOiYECMFk1KldouBqI17yrq7+qBl19AcyvShTB6fKb6WMK4ouOn0qQh2CyLkdyAVzHK+cQqFku/rd4/VgdREGe+Y1cz8jh6H3ybhF3+wVhJMoaOj+S76Eu6wUFvvSqh4baDDQEBfIDXvdl0umpiUGf0VcJCBZOi7cYOt3JHptCjuU3r3/XHM3I7uas2yOW1jl49SueaAbKaNOCi/xPrQyfgT2ws7 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Fri, Apr 25, 2025 at 03:45:20PM -0700, Ackerley Tng wrote: > Yan Zhao writes: > > > On Thu, Apr 24, 2025 at 11:15:11AM -0700, Ackerley Tng wrote: > >> Vishal Annapurve writes: > >> > >> > On Thu, Apr 24, 2025 at 1:15 AM Yan Zhao wrote: > >> >> > >> >> On Thu, Apr 24, 2025 at 01:55:51PM +0800, Chenyi Qiang wrote: > >> >> > > >> >> > > >> >> > On 4/24/2025 12:25 PM, Yan Zhao wrote: > >> >> > > On Thu, Apr 24, 2025 at 09:09:22AM +0800, Yan Zhao wrote: > >> >> > >> On Wed, Apr 23, 2025 at 03:02:02PM -0700, Ackerley Tng wrote: > >> >> > >>> Yan Zhao writes: > >> >> > >>> > >> >> > >>>> On Tue, Sep 10, 2024 at 11:44:10PM +0000, Ackerley Tng wrote: > >> >> > >>>>> +/* > >> >> > >>>>> + * Allocates and then caches a folio in the filemap. Returns a folio with > >> >> > >>>>> + * refcount of 2: 1 after allocation, and 1 taken by the filemap. > >> >> > >>>>> + */ > >> >> > >>>>> +static struct folio *kvm_gmem_hugetlb_alloc_and_cache_folio(struct inode *inode, > >> >> > >>>>> + pgoff_t index) > >> >> > >>>>> +{ > >> >> > >>>>> + struct kvm_gmem_hugetlb *hgmem; > >> >> > >>>>> + pgoff_t aligned_index; > >> >> > >>>>> + struct folio *folio; > >> >> > >>>>> + int nr_pages; > >> >> > >>>>> + int ret; > >> >> > >>>>> + > >> >> > >>>>> + hgmem = kvm_gmem_hgmem(inode); > >> >> > >>>>> + folio = kvm_gmem_hugetlb_alloc_folio(hgmem->h, hgmem->spool); > >> >> > >>>>> + if (IS_ERR(folio)) > >> >> > >>>>> + return folio; > >> >> > >>>>> + > >> >> > >>>>> + nr_pages = 1UL << huge_page_order(hgmem->h); > >> >> > >>>>> + aligned_index = round_down(index, nr_pages); > >> >> > >>>> Maybe a gap here. > >> >> > >>>> > >> >> > >>>> When a guest_memfd is bound to a slot where slot->base_gfn is not aligned to > >> >> > >>>> 2M/1G and slot->gmem.pgoff is 0, even if an index is 2M/1G aligned, the > >> >> > >>>> corresponding GFN is not 2M/1G aligned. > >> >> > >>> > >> >> > >>> Thanks for looking into this. > >> >> > >>> > >> >> > >>> In 1G page support for guest_memfd, the offset and size are always > >> >> > >>> hugepage aligned to the hugepage size requested at guest_memfd creation > >> >> > >>> time, and it is true that when binding to a memslot, slot->base_gfn and > >> >> > >>> slot->npages may not be hugepage aligned. > >> >> > >>> > >> >> > >>>> > >> >> > >>>> However, TDX requires that private huge pages be 2M aligned in GFN. > >> >> > >>>> > >> >> > >>> > >> >> > >>> IIUC other factors also contribute to determining the mapping level in > >> >> > >>> the guest page tables, like lpage_info and .private_max_mapping_level() > >> >> > >>> in kvm_x86_ops. > >> >> > >>> > >> >> > >>> If slot->base_gfn and slot->npages are not hugepage aligned, lpage_info > >> >> > >>> will track that and not allow faulting into guest page tables at higher > >> >> > >>> granularity. > >> >> > >> > >> >> > >> lpage_info only checks the alignments of slot->base_gfn and > >> >> > >> slot->base_gfn + npages. e.g., > >> >> > >> > >> >> > >> if slot->base_gfn is 8K, npages is 8M, then for this slot, > >> >> > >> lpage_info[2M][0].disallow_lpage = 1, which is for GFN [4K, 2M+8K); > >> >> > >> lpage_info[2M][1].disallow_lpage = 0, which is for GFN [2M+8K, 4M+8K); > >> >> > >> lpage_info[2M][2].disallow_lpage = 0, which is for GFN [4M+8K, 6M+8K); > >> >> > >> lpage_info[2M][3].disallow_lpage = 1, which is for GFN [6M+8K, 8M+8K); > >> >> > > >> >> > Should it be? > >> >> > lpage_info[2M][0].disallow_lpage = 1, which is for GFN [8K, 2M); > >> >> > lpage_info[2M][1].disallow_lpage = 0, which is for GFN [2M, 4M); > >> >> > lpage_info[2M][2].disallow_lpage = 0, which is for GFN [4M, 6M); > >> >> > lpage_info[2M][3].disallow_lpage = 0, which is for GFN [6M, 8M); > >> >> > lpage_info[2M][4].disallow_lpage = 1, which is for GFN [8M, 8M+8K); > >> >> Right. Good catch. Thanks! > >> >> > >> >> Let me update the example as below: > >> >> slot->base_gfn is 2 (for GPA 8KB), npages 2000 (for a 8MB range) > >> >> > >> >> lpage_info[2M][0].disallow_lpage = 1, which is for GPA [8KB, 2MB); > >> >> lpage_info[2M][1].disallow_lpage = 0, which is for GPA [2MB, 4MB); > >> >> lpage_info[2M][2].disallow_lpage = 0, which is for GPA [4MB, 6MB); > >> >> lpage_info[2M][3].disallow_lpage = 0, which is for GPA [6MB, 8MB); > >> >> lpage_info[2M][4].disallow_lpage = 1, which is for GPA [8MB, 8MB+8KB); > >> >> > >> >> lpage_info indicates that a 2MB mapping is alllowed to cover GPA 4MB and GPA > >> >> 4MB+16KB. However, their aligned_index values lead guest_memfd to allocate two > >> >> 2MB folios, whose physical addresses may not be contiguous. > >> >> > >> >> Additionally, if the guest accesses two GPAs, e.g., GPA 2MB+8KB and GPA 4MB, > >> >> KVM could create two 2MB mappings to cover GPA ranges [2MB, 4MB), [4MB, 6MB). > >> >> However, guest_memfd just allocates the same 2MB folio for both faults. > >> >> > >> >> > >> >> > > >> >> > >> > >> >> > >> --------------------------------------------------------- > >> >> > >> | | | | | | | | | > >> >> > >> 8K 2M 2M+8K 4M 4M+8K 6M 6M+8K 8M 8M+8K > >> >> > >> > >> >> > >> For GFN 6M and GFN 6M+4K, as they both belong to lpage_info[2M][2], huge > >> >> > >> page is allowed. Also, they have the same aligned_index 2 in guest_memfd. > >> >> > >> So, guest_memfd allocates the same huge folio of 2M order for them. > >> >> > > Sorry, sent too fast this morning. The example is not right. The correct > >> >> > > one is: > >> >> > > > >> >> > > For GFN 4M and GFN 4M+16K, lpage_info indicates that 2M is allowed. So, > >> >> > > KVM will create a 2M mapping for them. > >> >> > > > >> >> > > However, in guest_memfd, GFN 4M and GFN 4M+16K do not correspond to the > >> >> > > same 2M folio and physical addresses may not be contiguous. > >> > > >> > Then during binding, guest memfd offset misalignment with hugepage > >> > should be same as gfn misalignment. i.e. > >> > > >> > (offset & ~huge_page_mask(h)) == ((slot->base_gfn << PAGE_SHIFT) & > >> > ~huge_page_mask(h)); > >> > > >> > For non guest_memfd backed scenarios, KVM allows slot gfn ranges that > >> > are not hugepage aligned, so guest_memfd should also be able to > >> > support non-hugepage aligned memslots. > >> > > >> > >> I drew up a picture [1] which hopefully clarifies this. > >> > >> Thanks for pointing this out, I understand better now and we will add an > >> extra constraint during memslot binding of guest_memfd to check that gfn > >> offsets within a hugepage must be guest_memfd offsets. > > I'm a bit confused. > > > > As "index = gfn - slot->base_gfn + slot->gmem.pgoff", do you mean you are going > > to force "slot->base_gfn == slot->gmem.pgoff" ? > > > > For some memory region, e.g., "pc.ram", it's divided into 2 parts: > > - one with offset 0, size 0x80000000(2G), > > positioned at GPA 0, which is below GPA 4G; > > - one with offset 0x80000000(2G), size 0x80000000(2G), > > positioned at GPA 0x100000000(4G), which is above GPA 4G. > > > > For the second part, its slot->base_gfn is 0x100000000, while slot->gmem.pgoff > > is 0x80000000. > > > > Nope I don't mean to enforce that they are equal, we just need the > offsets within the page to be equal. > > I edited Vishal's code snippet, perhaps it would help explain better: > > page_size is the size of the hugepage, so in our example, > > page_size = SZ_2M; > page_mask = ~(page_size - 1); page_mask = page_size - 1 ? > offset_within_page = slot->gmem.pgoff & page_mask; > gfn_within_page = (slot->base_gfn << PAGE_SHIFT) & page_mask; > > We will enforce that > > offset_within_page == gfn_within_page; For "pc.ram", if it has 2.5G below 4G, it would be configured as follows - slot 1: slot->gmem.pgoff=0, base GPA 0, size=2.5G - slot 2: slot->gmem.pgoff=2.5G, base GPA 4G, size=1.5G When binding these two slots to the same guest_memfd created with flag KVM_GUEST_MEMFD_HUGE_1GB: - binding the 1st slot will succeed; - binding the 2nd slot will fail. What options does userspace have in this scenario? It can't reduce the flag to KVM_GUEST_MEMFD_HUGE_2MB. Adjusting the gmem.pgoff isn't ideal either. What about something similar as below? diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c index d2feacd14786..87c33704a748 100644 --- a/virt/kvm/guest_memfd.c +++ b/virt/kvm/guest_memfd.c @@ -1842,8 +1842,16 @@ __kvm_gmem_get_pfn(struct file *file, struct kvm_memory_slot *slot, } *pfn = folio_file_pfn(folio, index); - if (max_order) - *max_order = folio_order(folio); + if (max_order) { + int order; + + order = folio_order(folio); + + while (order > 0 && ((slot->base_gfn ^ slot->gmem.pgoff) & ((1 << order) - 1))) + order--; + + *max_order = order; + } *is_prepared = folio_test_uptodate(folio); return folio; > >> Adding checks at binding time will allow hugepage-unaligned offsets (to > >> be at parity with non-guest_memfd backing memory) but still fix this > >> issue. > >> > >> lpage_info will make sure that ranges near the bounds will be > >> fragmented, but the hugepages in the middle will still be mappable as > >> hugepages. > >> > >> [1] https://lpc.events/event/18/contributions/1764/attachments/1409/3706/binding-must-have-same-alignment.svg