From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 30F8BC7EE29 for ; Fri, 9 Jun 2023 01:57:44 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A97DB8E0002; Thu, 8 Jun 2023 21:57:43 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id A477C8E0001; Thu, 8 Jun 2023 21:57:43 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8C13A8E0002; Thu, 8 Jun 2023 21:57:43 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 74F998E0001 for ; Thu, 8 Jun 2023 21:57:43 -0400 (EDT) Received: from smtpin25.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 396AA402C8 for ; Fri, 9 Jun 2023 01:57:43 +0000 (UTC) X-FDA: 80881548006.25.771289E Received: from NAM11-DM6-obe.outbound.protection.outlook.com (mail-dm6nam11on2065.outbound.protection.outlook.com [40.107.223.65]) by imf02.hostedemail.com (Postfix) with ESMTP id 5FCFE8000C for ; Fri, 9 Jun 2023 01:57:40 +0000 (UTC) Authentication-Results: imf02.hostedemail.com; dkim=pass header.d=Nvidia.com header.s=selector2 header.b=TAoYY1st; dmarc=pass (policy=reject) header.from=nvidia.com; spf=pass (imf02.hostedemail.com: domain of ziy@nvidia.com designates 40.107.223.65 as permitted sender) smtp.mailfrom=ziy@nvidia.com; arc=pass ("microsoft.com:s=arcselector9901:i=1") ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1686275860; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=DhVVKm4Di3Lydr0swdFCR7EXQXcQpoeR2mT0tR8ykkU=; b=CJ8IS0/HT1mtzorBroyo22/5HSLJZEuPSTucb6DVmVkTZcwkyINd/qEipF4U00cSmnb0CB NsTCls76zFyFhNubAB7auXpY57Oy2qsB1ofwBJFbGHs+nGNNyEN2ZN/tq2uk15hwM5wCnm qv84dUaxg3K+spCveLTlXxX7y8Vc+MA= ARC-Authentication-Results: i=2; imf02.hostedemail.com; dkim=pass header.d=Nvidia.com header.s=selector2 header.b=TAoYY1st; dmarc=pass (policy=reject) header.from=nvidia.com; spf=pass (imf02.hostedemail.com: domain of ziy@nvidia.com designates 40.107.223.65 as permitted sender) smtp.mailfrom=ziy@nvidia.com; arc=pass ("microsoft.com:s=arcselector9901:i=1") ARC-Seal: i=2; s=arc-20220608; d=hostedemail.com; t=1686275860; a=rsa-sha256; cv=pass; b=jTiH6btG6af0uw1kDa5OohUT5/Jw1WGGB1ZC/umK6X8WDJt0dh2mLbOlURPlTLynysowdT IU9Iu1U65FmCP0A3pcPR9WCIXOFGTaBZ7bJ0t4ENHGiVQjk2Nre8YS9JLfMpK0qfA4J8yP CNMqr3Vb0AGGB11DFhMFNqgtGyAG9Co= ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=RNzaM1+WKjFQxns57AZt72pMCAFuwMutAPwy5eiQCQjJ/BXd2Y5UQaVcm1+egEgPG0M8TpLSTcsUG8L89rVvFfIJUKBK/iylL/uFV+gb2Z1vB52VxNT6lvGB2uaL6fS6kNlvM2NeB+us8q5n6e/okQI+6mHAUIAWmSNNX299kZfdro8KJpVJQvMSx/169hrZGqvm5sEzba0LxhxveIJq2OBfuSIIDkp9nuGSNFb17Q0Vaays2ivG8NB+ehdEvt1MDySwbZ9z7CyjTJop7csEjk4aVNgCqvJGTjS6ZiBjfsT1r5BnFUmwEP9ueyl9DJnYtbEOzuTQCg2F48XawFfFXg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=DhVVKm4Di3Lydr0swdFCR7EXQXcQpoeR2mT0tR8ykkU=; b=Zzy6tu/RyzQ1yMX1Jz3h0P0FDpebmh0SGNyotAbUQQfVk356U70WrAU0jcScv8xr+MyZLSqptruKV0bUr1AffhFC+iITWoCuiS9gM3O3QBn/IbVTiKzy5BjO/AdTWcyVyVjCNMxIFg0Qh+wpstXLzLuPlKC7pRzrMHNWictaL3S6K2muYplyiHIMdzRxoPWodSq9+tJew3JAUZWH7nhBhcOA8/VSJukzvxsikghFX2mRCu7FRP4dmXswYcAh1qIFD2DGNP8mGRB7+ARw/EKgIVCDkMSUmOb/AYcKTVmO5+tKQ1quYfSfobtjMOY1lh7r8t4ptCqfTzfG+L6K3tHe+A== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=nvidia.com; dmarc=pass action=none header.from=nvidia.com; dkim=pass header.d=nvidia.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=DhVVKm4Di3Lydr0swdFCR7EXQXcQpoeR2mT0tR8ykkU=; b=TAoYY1stVDH/ggw9/8LPHTPdLz8mCn0E6LvNrUv6a4dEKixvcNrr4T/FOyiwqJ96IDQp4Vk8jjRVp7X/QOZ8E3xr8KOPmWskIeGanzQA11TEFDo8G14nA/nvmLd+PxdN0kTFVwFo7XUPeRWnJKZzLKKktk9ghYt1OYzSrtgvupsEKUNt1LONKu4czF7ecWOWaullbXaVcfZzJJQkThqVs8RXKeki1wLkrxMPXxOOwvaMdERtNy5++6H84SzAMIsvMegPiiOAmwoV/qL0mudoh9MptZFSZY1w46EZwT55kD3Zbb1SFHlRf0Vhn772l18yKN4thL9aW57TQjHiQqfzBQ== Received: from DS7PR12MB5744.namprd12.prod.outlook.com (2603:10b6:8:73::18) by MN0PR12MB5764.namprd12.prod.outlook.com (2603:10b6:208:377::8) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6455.33; Fri, 9 Jun 2023 01:57:37 +0000 Received: from DS7PR12MB5744.namprd12.prod.outlook.com ([fe80::12b7:fbc0:80e1:4b8b]) by DS7PR12MB5744.namprd12.prod.outlook.com ([fe80::12b7:fbc0:80e1:4b8b%3]) with mapi id 15.20.6455.030; Fri, 9 Jun 2023 01:57:37 +0000 From: Zi Yan To: Mike Kravetz Cc: Yang Shi , David Hildenbrand , David Rientjes , Yosry Ahmed , James Houghton , Naoya Horiguchi , Miaohe Lin , lsf-pc@lists.linux-foundation.org, linux-mm@kvack.org, Peter Xu , Michal Hocko , Matthew Wilcox , Axel Rasmussen , Jiaqi Yan Subject: Re: [LSF/MM/BPF TOPIC] HGM for hugetlbfs Date: Thu, 08 Jun 2023 21:57:34 -0400 X-Mailer: MailMate (1.14r5964) Message-ID: <6B42EC7F-7EB6-45E0-AF4D-F4F0FA7A012E@nvidia.com> In-Reply-To: <20230608212336.GA88798@monkey> References: <20230602172723.GA3941@monkey> <7e0ce268-f374-8e83-2b32-7c53f025fec5@google.com> <7c42a738-d082-3338-dfb5-fd28f75edc58@redhat.com> <75d5662a-a901-1e02-4706-66545ad53c5c@redhat.com> <20230607220651.GC4122@monkey> <686e3e61-704e-1258-8a8b-f18399b41668@google.com> <20230608212336.GA88798@monkey> Content-Type: multipart/signed; boundary="=_MailMate_E9EBC5C4-582B-4C86-BC53-DDCBAAE1DB36_="; micalg=pgp-sha512; protocol="application/pgp-signature" X-ClientProxiedBy: BL1PR13CA0394.namprd13.prod.outlook.com (2603:10b6:208:2c2::9) To DS7PR12MB5744.namprd12.prod.outlook.com (2603:10b6:8:73::18) MIME-Version: 1.0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: DS7PR12MB5744:EE_|MN0PR12MB5764:EE_ X-MS-Office365-Filtering-Correlation-Id: bffe650c-f0a7-45cb-f29f-08db688ce664 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: NRlqThMo6ZngSqD3Y5XOyJ169JMr0EDpPstxTAx3hNMocmSWxKDxJomVRgugs7TBR67f/TGglflQbJzNgSZ1CyezZVnJQzXEJGhFLu5la7qA6ioYE64cFPPZTfVMx81h8hoNXagXcFIV7eWcTzc1uB5cbZq5vOTHrY2F5zW3cAaPWEJ1bc5njWNKRSvY8V7PzJyD+JwjfV/QHI03QTgHnuaynkqsgnXse3RHduXwyqCsIvJZwQR9PW8fWHZ15cHo9dzjjhRpX0SOCVwaNUSfRUAL6uA4sDWHKsEWwLzzOqn1djBRomKpkTqy4RqN0tRHK/02LU15oGLVIB5sdD1wfS0zI4ul0Xb8waSbjUPd16hEYSK2Fi0nqLT8kHVEf9qgBht+DihJulxOrtuSAa7M9y74/Aj/hjXaI5aKM4VvDfhEGvJFg9/BrbMwd5hvYsFGuHlYUja8HKYXRFlgf9u7lE7joXqZF22yrk2E1AshJpEEiWT0qwcd52B8LAJoGsRq9CdNsBf0j9BMnLYo9bBnf5XFfOzHU4I1JWIbOMWVDFnoeKwKOeQu1Qif3ljBMc1+7IgWldIWXEIqUefErECSZCDeuHPCTcDHq+XQQRqDqLf41UiOX0WhViC/n6bP0PdDyLjswIdJV9n6xwZHez8VUA== X-Forefront-Antispam-Report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:DS7PR12MB5744.namprd12.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230028)(4636009)(39860400002)(136003)(366004)(346002)(396003)(376002)(451199021)(2616005)(53546011)(6512007)(6506007)(26005)(38100700002)(83380400001)(41300700001)(33964004)(6666004)(6486002)(186003)(66476007)(478600001)(54906003)(66556008)(316002)(4326008)(66946007)(8676002)(5660300002)(235185007)(6916009)(7416002)(33656002)(86362001)(2906002)(8936002)(36756003)(45980500001)(72826004);DIR:OUT;SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: =?utf-8?B?OFZXY2wxR0FudE9CcG5rZG51UzdoMWg4dGcxdjRYTFpRMjdaUC82VnE1T3p3?= =?utf-8?B?UDYxY0dOQ1NvSmQ5TFRObTNxY3VLRFJRUkt3M0lpbm9rbXBBVVpFdnJDc0h6?= =?utf-8?B?dUZHYnNXUXFhZlJsRVErMThZSXAxKzFXWDYyWDZBMFpZdVo4TUxGaS9uZkY1?= =?utf-8?B?MWppT2UwWTZzcXhEUnE3MFh3TWIzeVdnMFc4bENvMW5aNmlDU0NSWGZ6c2Fl?= =?utf-8?B?L3NzbEpPb0R0L29IR0t2czJ4RUl2emVIT1JUbnlTSzlUSEsrZm1RQXNJRmU0?= =?utf-8?B?T1BnNlp0ODBrdFJORUxLL0ppajhacFkrNnJ2b1pmdXlzWjJmdGQxRGhKaFZK?= =?utf-8?B?Vkh2N1RFa1B5QUEwSlpLaVhCTmpvQ1VBbEw3NVdzdUlFVkFHSVdFRWkxTGpP?= =?utf-8?B?MTkxUFJYNE01SFNISFFpY0xUREl6dVBjZUIrRTExWWhVWEVpSnZzcWQ3TmY4?= =?utf-8?B?dlc1T2lKWFhmdm1OY1ZTSko0QlRmRnorditsM0JLbDFrTDdTN2dpSll0UUY5?= =?utf-8?B?NjVkS29hTVJCTkJKRmRkTkFhaHlzVi9qYjJpSFlCZU4zZVFqYnVUVXpONGs4?= =?utf-8?B?OVhjNENWYUdJOGx3RnVxT0Y2enBJL1UwNTJBeDlRYnJBRkllTklpaGJxbDY2?= =?utf-8?B?dXZqNDZ2SVY2eUFLY0cwMTFaSjFzNk1lUk5kcnJnV1dKVklwc0JwTnExSHhX?= =?utf-8?B?VjZzVmRabVc5ekliL0JpVE9rMUI2QWVvQWZISmh3RkZoUmpOOUg4QVVuSUVl?= =?utf-8?B?LzBaYU45Nm5rVkhaTEo2UTlxMFF5K3NpazVObDBQWWtTMHNTYXZjNEw3SE15?= =?utf-8?B?ckNUQlJxcnNqbE5oUTY4Z3BMWktDdTA4dUZWcWxDSlFNM3BVKzU4a1dvcitQ?= =?utf-8?B?c3ZFMjFTVnZ5WnE0VVoyTWljQit5TG9pb2tkWnV6Q1JPV0MraTJMZmVXaXJa?= =?utf-8?B?ZHkwZmgxVmZ2MEo0SHJlY3JIYS9Ec0FpNElSdmxuU1IzYk9HbE90VnZqbFNV?= =?utf-8?B?Q3VQWHBZOUpBT1haRnlUVDZ4bmZGalkrY0FrckRuVFZTd1pQR2lVSFhWc2NB?= =?utf-8?B?a0VKeld1SFNRZWxCczZ0T3ppTlFXMnRkd3BZV1l0bWFpWWVsdUkvVnVXWEtO?= =?utf-8?B?OCtrMWp3ZmZpWDVlcGRWc0xXdlhrVnErZ1NQSG9MUzZ6ZEEzWG5aT2RuMWlF?= =?utf-8?B?RVN3QkRIUXltRVU4YjNNRndNdXZxRFR1eGIzRzlwd3k3WEo2Q2tUdzNudmZo?= =?utf-8?B?U241a0Riams0VE11bEE3b3hNZW1VblhqN0hJVEVMS2NDdzlJMyt4T3N1OTlX?= =?utf-8?B?bUR0UTZkdEpZMWtOK1VaZTYyRGU4bkh6bXhPUm9DRE1NaG54SmpRZVNtNkJZ?= =?utf-8?B?Um9XUHBUb3p2YW1BOUJES0cySDh6bVduRjAwTzJiQzBmbUY5S2tNR2J4ZHhG?= =?utf-8?B?cXpTay9wVGRLbXY0bVBhTmlZNy9oVXN2WDZ1ZW01eUZ4Sm9EZjk5RjYzbXh3?= =?utf-8?B?bnVVRU5pR0t2RGFRSWpERDZQZXkrT29oK0QrNDJKUFdiZjZsM01meUpnK0xu?= =?utf-8?B?WmduZVhKSzNIaEEyVit5SWpUY0ZNZ2pHKys4ZVIxaEN6SWU5SDdOMU9ZdmQv?= =?utf-8?B?K1ZWUE4zM0UySkNKVDFCMkJZUHdzajhuOVlXYlJSQk1aRS9KQUpBUlBBWTRp?= =?utf-8?B?TUNYRjR6UmhIMkJMZjRYT2dXemtjQnRPL04zQnduai9ZRmJodXF1MDhsSG9u?= =?utf-8?B?ZUpOS3BlRG1GZHRqZ2pEeHUxUXFVeHZWYXlSRk5Ndk9QMlV5NDZxS3h2N3RK?= =?utf-8?B?UStPUkdPakV4NnFoYkE5NUNCRGNEQTlpOXZKWWhUL3p0a0hEakcrVGh2YnVl?= =?utf-8?B?cG8wVEUvNWhMMyt6cTFLMm1ZUkhVdzNsTjA3VER5OUdXQkxSejdHRmFFdlkx?= =?utf-8?B?Qkl3Z0phNjQ2TEpoMjdWTTFPKzJLdU9CSWRMbS9RNzNYYTFHQ0R3MFJQNEpj?= =?utf-8?B?V1o0MEU1K0cwNXRYRHdJNzRRbnhDR2lBYkJnU0dyekh3ZkRiT3hFZnoxNm9H?= =?utf-8?B?cE9hTmdlaUNSNnZTS3NBUndibzRsS1QwSTc4bVI0cHlVUVRPM29mOS9DN1U1?= =?utf-8?Q?0qvmKHFcOY5eluOpOKYswwvSt?= X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-Network-Message-Id: bffe650c-f0a7-45cb-f29f-08db688ce664 X-MS-Exchange-CrossTenant-AuthSource: DS7PR12MB5744.namprd12.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 09 Jun 2023 01:57:37.4744 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: enm0ILG+Mx1cIDo9oh3XxZZlGu174tXH2MMtikNQj1Sq05SOQ1r5q3x8z4JsOGLL X-MS-Exchange-Transport-CrossTenantHeadersStamped: MN0PR12MB5764 X-Rspamd-Queue-Id: 5FCFE8000C X-Rspam-User: X-Rspamd-Server: rspam02 X-Stat-Signature: t4r1a6bajx7hmhgqh69nhm5gcfbbd4sx X-HE-Tag: 1686275860-274072 X-HE-Meta: U2FsdGVkX1+5jXeQgZaZlE6C82XHCS9yHZMRaeDIXjjpkKrKGgOpfAQ70z1hX5awiKyC5bWqE6NMa1ZgvBGNTO2ybb7pN8uNApwIzQ5FLitZ9gwmXAQp9p4hJmhMxz+ygrITALw5xmVJ9lBaW1emL3maIzxPFbbotheroFP6mc6Wp/n2JAjrkYp245JKC23BEeADzz454Nb8OqyyU2eyfgn8gFfzm1/ILBmu9xmKUsZozlWbcDRa98UCvn4h00GKHdjtOus2edZ9kfVMVwg834ETRT9s/O8SnNGcVG9mJWTQ6cduRLVzjjESREVjaRX8EPdsbGjQrdLoonIed+D93CbBViDqvcgfFFK0j6vFMAbKwLo4ppSv6FEy/pbgEpx4VK/ZK7WE4g5155Xj6AF/Lsz4sQmRn0MzAWJw7lG05MjizeB/Ab1LhSmk89uBvjSWeJBgg+HuAGCCPXE3OHAbehEAQ/ER0aUcrvl4gQmuJKat/OF7KpZaa3pTXgNsBlAexzs69JPXmq+3l7LEQh3CPwnqsK/qFQuTvXQf6rGxFS5b604UnWRYhKYDzO0NpZFx3n0SYYAcA9zX0ZlAT7DzeEtMIyZcPjxYjWGiUNE4TmxP7qHMfDIuOsBhxV+Dgcfwn+MrKLMG1pIUAKIQoRn/C5+SI1Xwbeos2/0jtwuMLv87AVcALHm47T32qV3FW52qsoJCXfgrM6kFZMESvaJEfhCzxD4icpHbOhn5bTDVgw0KK0lhFp5gKEIvVO058RVKTgAvzAofqLRWTDnPR8yQj0E4S4+u1h34W+Fj8KXEfnpgbRUNKTTC9+2enfUxXt9W1LnQLWsgWMPqlRtpFPF6DjibBilgX7ukug2+oLv3OzVQzrWFeN81tktOrpQcaWPbhZSeTwEAkKFjmhHVG4U4yDZoAgRSd+2U8jPbZ6nHQa67zzPmIp9M40DvbC2fHkkn6MeQtXnvGwaGZtNSBpC G+SGTVTs Rjfmcx2LqKu/okaF8hy8GfIajdDgWrPh6c0aQ1Fj8BoS+TA5oNX41JShRlOhRg+H9pXNbUa5mEjQDa07Oxla1t7wV8m/MJ61DrJjHfXUcNo7d/nt4Hl+eBUdozeFu4cvzhyZWKQJFEiGgk3gh7DbXaTgD/Xv1pbo0cipv61mIS2AIWCUvusgN9gP6v4xoisguC6UW3LPd+iOd20PkiNlD23CZq2H0lC0aCBjJGnCvgUqMNL9FE1GAlCuCjan9Z/fP9/d7esGqRkYpPObBlI3SBZReSvm1R62mn2u3dPMYMHohEmUD0zMqxUzzTzbfHiEsrw/8f3JS7wNR6XfuEnw7zOsBAbeXglXVMxin X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: --=_MailMate_E9EBC5C4-582B-4C86-BC53-DDCBAAE1DB36_= Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable On 8 Jun 2023, at 17:23, Mike Kravetz wrote: > On 06/08/23 11:50, Yang Shi wrote: >> On Wed, Jun 7, 2023 at 11:34=E2=80=AFPM David Hildenbrand wrote: >>> >>> On 08.06.23 02:02, David Rientjes wrote: >>>> On Wed, 7 Jun 2023, Mike Kravetz wrote: >>>> >>>>>>>>> Are there strong objections to extending hugetlb for this suppo= rt? >>>>>>>> >>>>>>>> I don't want to get too involved in this discussion (busy), but = I >>>>>>>> absolutely agree on the points that were raised at LSF/MM that >>>>>>>> >>>>>>>> (A) hugetlb is complicated and very special (many things not int= egrated >>>>>>>> with core-mm, so we need special-casing all over the place). [ex= ample: >>>>>>>> what is a pte?] >>>>>>>> >>>>>>>> (B) We added a bunch of complexity in the past that some people >>>>>>>> considered very important (and it was not feature frozen, right?= ;) ). >>>>>>>> Looking back, we might just not have done some of that, or done = it >>>>>>>> differently/cleaner -- better integrated in the core. (PMD shari= ng, >>>>>>>> MAP_PRIVATE, a reservation mechanism that still requires preallo= cation >>>>>>>> because it fails with NUMA/fork, ...) >>>>>>>> >>>>>>>> (C) Unifying hugetlb and the core looks like it's getting more a= nd more >>>>>>>> out of reach, maybe even impossible with all the complexity we a= dded >>>>>>>> over the years (well, and keep adding). >>>>>>>> >>>>>>>> Sure, HGM for the purpose of better hwpoison handling makes sens= e. But >>>>>>>> hugetlb is probably 20 years old and hwpoison handling probably = 13 years >>>>>>>> old. So we managed to get quite far without that optimization. >>>>>>>> >>>> >>>> Sane handling for memory poisoning and optimizations for live migrat= ion >>>> are both much more important for the real-world 1GB hugetlb user, so= it >>>> doesn't quite have that lengthy of a history. >>>> >>>> Unfortuantely, cloud providers receive complaints about both of thes= e from >>>> customers. They are one of the most significant causes for poor cus= tomer >>>> experience. >>>> >>>> While people have proposed 1GB THP support in the past, it was nacke= d, in >>>> part, because of the suggestion to just use existing 1GB support in >>>> hugetlb instead :) >> >> Yes, but it was before HGM was proposed, we may revisit it. >> > > Adding Zi Yan on CC as the person driving 1G THP. Thanks. I have not attended the LSF/MM, but the points above mostly look valid. IMHO, if we keep adding new features to hugetlbfs, we might have two parallel memory systems, replicating each other a lot. Maybe it is the ti= me to think about how to merge hugetlbfs features back to core mm. =46rom my understanding, the most desirable user visible feature of huget= lbfs is that it provides deterministic huge page allocation, since huge pages are preserved. If we can preserve that, replacing hugetlbfs backend with THP or even just plain folio should be good enough. Let me know if I miss= any important user visible feature. On the hugetlbfs backend, PMD sharing, MAP_PRIVATE, reducing struct page storage all look features core mm might want. Merging these features back= to core mm might be a good first step. I thought about replacing hugetlbfs backend with THP (with my 1GB THP sup= port), but find that not all THP features are necessary for hugetlbfs users or compatible with existing hugetlbfs. For example, hugetlbfs does not need transparent page split, since user just wants that big page size. And pag= e split might not get along with reducing struct page storage feature. In sum, I think we might not need all THP features (page table entry spli= t and huge page split) to replace hugetlbfs and we might just need to enabl= e core mm to handle any size folio and hugetlb pages are just folios that can go as large as 1GB. As a result, hugetlb pages can take advantage of all core mm features, like hwpoison. >>> >>> Yes, because I still think that the use for "transparent" (for the us= er) >>> nowadays is very limited and not worth the complexity. >>> >>> IMHO, what you really want is a pool of large pages that (guarantees >>> about availability and nodes) and fine control about who gets these >>> pages. That's what hugetlb provides. >> >> The most concern for 1G THP is the allocation time. But I don't think >> it is a no-go for allocating THP from a preallocated pool, for >> example, CMA. > > I seem to remember Zi trying to use CMA for 1G THP allocations. Howeve= r, I > am not sure if using CMA would be sufficient. IIUC, allocating from CM= A could > still require page migrations to put together a 1G contiguous area. In= a pool > as used by hugetlb, 1G pages are pre-allocated and sitting in the pool.= The > downside of such a pool is that the memory can not be used for other pu= rposes > and sits 'idle' if not allocated. Yes, I tried that. One big issue is that at free time a 1GB THP needs to = be freed back to a CMA pool instead of buddy allocator, but THP can be split and a= fter split, it is really hard to tell whether a page is from a CMA pool or not= =2E hugetlb pages does not support page split yet, so the issue might not be relevant. But if a THP cannot be split freely, is it a still THP? So it c= omes back to my question: do we really want 1GB THP or just core mm can handle= any size folios? > > Hate to even bring this up, but there are complaints today about 'alloc= ation > time' of 1GB pages from the hugetlb pool. This 'allocation time' is ac= tually > the time it takes to clear/zero 1G of memory. Only reason I mention is= > using something like CMA to allocate 1G pages (at fault time) may add > unacceptable latency. One solution I had in mind is that you could zero these 1GB pages at free= time in a worker thread, so that you do not pay the penalty at page alloc= ation time. But it would not work if the allocation comes right after a page is= freed. > >>> >>> In contrast to THP, you don't want to allow for >>> * Partially mmap, mremap, munmap, mprotect them >>> * Partially sharing then / COW'ing them >>> * Partially mixing them with other anon pages (MADV_DONTNEED + refaul= t) >> >> IIRC, QEMU treats hugetlbfs as 2M block size, we should be able to >> teach QEMU to treat tmpfs + THP as 2M block size too. I used to have a= >> patch to make stat.st_blksize return THP size for tmpfs (89fdcd262fd4 >> mm: shmem: make stat.st_blksize return huge page size if THP is on). >> So when the applications are aware of the 2M or 1G page/block size, >> hopefully it may help reduce the partial mapping things. But I'm not >> an expert on QEMU, I may miss something. >> >>> * Exclude them from some features KSM/swap >>> * (swap them out and eventually split them for that) >> >> We have "noswap" mount option for tmpfs now, so swap is not a problem.= >> >> But we may lose some features, for example, PMD sharing, hugetlb >> cgroup, etc. Not sure whether they are a showstopper or not. >> >> So it sounds easier to have 1G THP than HGM IMHO if I don't miss >> something vital. > > I have always wanted to experiment with having THP use a pre-allocated > pool for huge page allocations. Of course, this adds the complication > of what to do when the pool is exhausted. > > Perhaps Zi has performed such experiments? Using CMA allocation is a similar experiment, but when CMA pools are exhausted, 1GB THP allocation will fail. We can try to use compaction to get more 1GB free pages, but that might take prohibitively long time and could fail at the end. At the end, let me ask this again: do we want 1GB THP to replace hugetlb or enable core mm to handle any size folios and change 1GB hugetlb page to a 1GB folio? -- Best Regards, Yan, Zi --=_MailMate_E9EBC5C4-582B-4C86-BC53-DDCBAAE1DB36_= Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename=signature.asc Content-Type: application/pgp-signature; name=signature.asc -----BEGIN PGP SIGNATURE----- iQJDBAEBCgAtFiEE6rR4j8RuQ2XmaZol4n+egRQHKFQFAmSChw8PHHppeUBudmlk aWEuY29tAAoJEOJ/noEUByhUMX4P/3vfZJs84lkCdgMIkLMOhwevDDi55kf1sCXg WC70GTcdluaAH/3eDCfMfCnzvetYkkt3OMWxW+11ECqeSGv/TfpHUqN/W1qLGBzS i6yBcc67SJ9rKfekcoN1kKOpmM1v4uYddFOfk3esiq/9JAaHwpPONqjMfbq2XoDS 2Q6Fq2/t+Z2l9DjSWd1bOK3/d0znmQ5mY3sMUVND0gE3+5VGrGZg6j/Q6OPaR390 KN4iqHuGHs8GdmQ2TW84peseK1vTe5I1NDfmrMWs1b/NTIwgrHsJuqPw8kccdSIK aC1NyqkEbbEK7LV6Kot8LnbZo1ZgGtaHyI4UcHCloW7nd8jgjrg/rspMrMn04JKO irS0UQlDKUu57MDH66r+7qfxRQ1NP+EsRmSfcVjwm+NlrX51Mqj1MjsCQfgsdlGT p3J0hwVRKKSCVdo7gel7RC5i7dZS5sVsCLk+V+iEbpeHDqmS3AwWM65JzsibyB48 sNihXIb33y17GhMzd4g9HHI1FkidayzxhFLEvvC2rGKnRuTJk0DLHJ/od11OCwxw nrBq911Yq6sD9fO13IvRUhWI953RSyeQjohlSF/X65Q9z+yOqWAjVmPPeQMOyYyL lZQZlWJHkWFKIe3PL07prliNL1g2Jny+7LC082iiFbsYCc7PR43dPPpwr49WswqF L9NJkzKH =WYLi -----END PGP SIGNATURE----- --=_MailMate_E9EBC5C4-582B-4C86-BC53-DDCBAAE1DB36_=--