From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 92A23C5475B for ; Wed, 6 Mar 2024 16:40:35 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E9EF76B0075; Wed, 6 Mar 2024 11:40:34 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id E4F246B007D; Wed, 6 Mar 2024 11:40:34 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id CC8EC6B007E; Wed, 6 Mar 2024 11:40:34 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id B56076B0075 for ; Wed, 6 Mar 2024 11:40:34 -0500 (EST) Received: from smtpin19.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 8582A80E1C for ; Wed, 6 Mar 2024 16:40:34 +0000 (UTC) X-FDA: 81867177588.19.4DA30FE Received: from NAM04-MW2-obe.outbound.protection.outlook.com (mail-mw2nam04on2050.outbound.protection.outlook.com [40.107.101.50]) by imf09.hostedemail.com (Postfix) with ESMTP id AA96E14000C for ; Wed, 6 Mar 2024 16:40:30 +0000 (UTC) Authentication-Results: imf09.hostedemail.com; dkim=pass header.d=Nvidia.com header.s=selector2 header.b="gQoo/44U"; spf=pass (imf09.hostedemail.com: domain of ziy@nvidia.com designates 40.107.101.50 as permitted sender) smtp.mailfrom=ziy@nvidia.com; dmarc=pass (policy=reject) header.from=nvidia.com; arc=pass ("microsoft.com:s=arcselector9901:i=1") ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1709743230; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=vSqbHe1eLAwxJShtnMwpJGreQrRDqZVCiExnBAXIQqE=; b=iRuizHNuf5WbDiUNlKWSLXXgLzUdw0GoCzLhrXeZPWBn9sY82T8AWib06PZYR8ZhCIoYpb VIV3Q+SE9gtzDSepgknD76Te9Z77CdzyWuHb28nZornpKiRb5yQFZoyCcUPN4K4FPyi2tm LhAIYKb0AR8R/QPP/d2igJtSeed4WZk= ARC-Seal: i=2; s=arc-20220608; d=hostedemail.com; t=1709743230; a=rsa-sha256; cv=pass; b=DC++XGP/8ACvkV+2jupITW2muxs8CrcEXYTn5Yk66jL3cDmDQuklrKRED4ejJTW9YrB+7O ZQhEyL7QLfimCrSNBlcYs8Haz5Js/xS6whAZTxGfzpNDKImdBZwtW1s/RINVuFPuBs5TUC UC6Kw8HdyfeD16DccTjCdVSklOee4bY= ARC-Authentication-Results: i=2; imf09.hostedemail.com; dkim=pass header.d=Nvidia.com header.s=selector2 header.b="gQoo/44U"; spf=pass (imf09.hostedemail.com: domain of ziy@nvidia.com designates 40.107.101.50 as permitted sender) smtp.mailfrom=ziy@nvidia.com; dmarc=pass (policy=reject) header.from=nvidia.com; arc=pass ("microsoft.com:s=arcselector9901:i=1") ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=eQJ8Py98+vCZ15+joUAwu54+X1XloiEEjjSYQgMe27KwDbkbi6mCNE+jVec8LzRPLaCTIX/X7PrvRNA2FTWaI0+YzIua+8dDnTaMnk+UKwsem6LiIUUe7zwdficScrOZUeUroRYt5EbMDQVojcCRZQhZtSu2W+eVDhjde4ch8D7D34OD7COB7gX/y5hNKs1tF55xAyc2+qnRPsRdIhaBoZW+ebjo2SNMf23Py4JlSiP4ITBjqjbTWPwkuW6AULFelm32K7A+rTff1U0ct5jJ5wlf3RVz7PDxV/DcDv/2DdVMOPEzUZ44G1HH/SwZMZj4qzHUWqKjcE89axoT2gn3TA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=vSqbHe1eLAwxJShtnMwpJGreQrRDqZVCiExnBAXIQqE=; b=d9iQNSn5MqychpFFSFQJ1UANDTyJznZzkHgrFLXE8LaZjQqkrQCk5GaJ0qtPc2upI1QaOha+ENkHBfPN1duaHmjEz2oqRtTa92iQPfeFaW2vgugF4epHhox/ygptsSUqQeAzn9S+ri3/0u+UXcvAdei+QoWkDRnubXR03g8Tn43qE4xt9RKhcbIYxRe7J4O17OaGGexqwjOBgJKjMi+V0KCr/QLcSkFKBZQe7FjjC+Y24Qk8FqBNfTGYZrut+a+NOHWlXeuVmyxftGCzDiJIZL1Brh4iRgNzipAj1/MvwrjyciNPaJNICAXx1xDbKqcpd7LFCd8o7e2LnBz1hJ38Bg== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=nvidia.com; dmarc=pass action=none header.from=nvidia.com; dkim=pass header.d=nvidia.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=vSqbHe1eLAwxJShtnMwpJGreQrRDqZVCiExnBAXIQqE=; b=gQoo/44UcqX9Mv6mIBwqdxi16HV8TtkBd/kpOrjt7UEklvWAqgERWbPyXLB2KcjaOyDV5W0oSqClLSMeuQ1Px39euMhgfxOqFyzAzhCqn5+W8/R2G4Q/+/6v9Ci56FEIMkJQfcT4pIB15mPnz+x4WheFf1SX+yYhZ7vTDB7J5r+ac8jkF9h1g7NQ+vNLlQJTG1em+fdJczEzVipcF/DfzAKV3rD2MSCI4pT3vzY7O2L5fQcxGNOURdmTnOnhU7DnE0GgxjcQK2TIBsbCcgChjYPaAuYzwH7o4spoq4EFO0giPA04Xo1BlO12ezb0Ru6abvC7RtlU9/cpdcRbzfUOTA== Received: from DS7PR12MB5744.namprd12.prod.outlook.com (2603:10b6:8:73::18) by DS0PR12MB6536.namprd12.prod.outlook.com (2603:10b6:8:d3::17) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7339.39; Wed, 6 Mar 2024 16:40:26 +0000 Received: from DS7PR12MB5744.namprd12.prod.outlook.com ([fe80::dc5c:2cf1:d5f5:9753]) by DS7PR12MB5744.namprd12.prod.outlook.com ([fe80::dc5c:2cf1:d5f5:9753%6]) with mapi id 15.20.7339.035; Wed, 6 Mar 2024 16:40:26 +0000 From: Zi Yan To: Johannes Weiner Cc: Yu Zhao , lsf-pc@lists.linux-foundation.org, linux-mm@kvack.org, Jonathan Corbet , Kaiyang Zhao Subject: Re: [LSF/MM/BPF TOPIC] TAO: THP Allocator Optimizations Date: Wed, 06 Mar 2024 11:40:23 -0500 X-Mailer: MailMate (1.14r6018) Message-ID: In-Reply-To: <20240306155110.GB891917@cmpxchg.org> References: <20240229183436.4110845-1-yuzhao@google.com> <20240306155110.GB891917@cmpxchg.org> Content-Type: multipart/signed; boundary="=_MailMate_F03DCF4A-CFB7-4414-B6F9-F1F6980D2303_="; micalg=pgp-sha512; protocol="application/pgp-signature" X-ClientProxiedBy: BLAPR03CA0079.namprd03.prod.outlook.com (2603:10b6:208:329::24) To DS7PR12MB5744.namprd12.prod.outlook.com (2603:10b6:8:73::18) MIME-Version: 1.0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: DS7PR12MB5744:EE_|DS0PR12MB6536:EE_ X-MS-Office365-Filtering-Correlation-Id: 057e2b03-aad3-44c1-2cc2-08dc3dfc2027 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: 9ZAkiRt6NLbjRj18anOaAbGO6GdsHi3RdhoYl2/XqT0FDutjt/e8Tq2/ece8y+ATEbuTaD9PeViB999GdUa0cylZgbuocXPRZUh6I9zDBdf55LvYY3JTXQgLgFKzZytM8mKLWH0ty2ymgjjc/A7jbQZDMwiR3YDtaggPWo26MvA13TK2ARFmjcn+mp8FKG1YHAcOtxpg1MUDa477EZGDML4GQhGtq87GStiUwYdoF4mP4WwalqMNvxMzi99OKzfncqZflilTC0f4GPuz3mg90kfvN/NWE3k40yhPVoMonXtWUpTl+6bFqeY+e+gzjnjSkllczSMiwtJYu4/JOj4Q9FjXgCBJ6GICG9HvLbEEOZxIrP3ZzBDeJLNbdCd+Ofjr702dYjPt68h0lHbE0l0ZBshWcronS+/i+sQMvPpANtUwFpEUgig//NGyWEN4oowjRdVQJ2qnovtqViyO7hH8P9JVPao1ZqKyb32n9sq2FCsnhsgyvQb8npGmyZ3SFOH5HXhV4y73D9894L1aFxpZI/A03lFMFCFJAr3namszFv04WM/HgrBlqSmx2E5Z/KooTldUHWzaLd9k6BmmztW7iVTlSHRX8qGgnAekQJdjO7FZlJ6Q9LJ6QugL77oFqTQ5wizzI+5hUQWxuq4PyL20IgYq7XTW2iaLR4IWELRbcRc= X-Forefront-Antispam-Report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:DS7PR12MB5744.namprd12.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230031)(376005);DIR:OUT;SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: =?us-ascii?Q?hTEEhTNwz0umWS8FLZY659jaI7UczpPqrH9EzI6J5JNbhjxXKkLZ9GHPKGdP?= =?us-ascii?Q?jKxuGCgOMjw7t7KINMxw1adSDaUz/ztJn9MTrjsmUwpfUGIjlsCbZcJOxYuz?= =?us-ascii?Q?nipFzsKodi2UrMVcPzj9n0RJq8bg5CasYA4AxRsXtebeAly3RODkbMaGERoO?= =?us-ascii?Q?n4K2z0Om2+3Wc+PI3Xz6+79wN5A6mCtQjC1k7aKpiIZpYajzKVnBUoVJgjqZ?= =?us-ascii?Q?QT9dTSfqr7g/KNiO0MJFKDw2Y4tJBDLP9f9wWuXynoAz/UR87uZ89Y2Kvlxq?= =?us-ascii?Q?T1TFcctNwutcAdh31yuHPff4Q/B2hmoKzQTbU3HkEXE2ZqWdBFz2HUw5el14?= =?us-ascii?Q?WNO8xAa+LeVhrbAyzRCcJSdhneqMkMURC1NVjciO/+pC5RqH1AVxACiXVzCU?= =?us-ascii?Q?UJMZHnUexsvFijSQVGvxh865BJJdQNd6gofVxkGjgd06dAyPxGYK03KJ3vep?= =?us-ascii?Q?TQwOzg6KsAD1QcyEd7qTcFQPIVUvXFFIo3ikan/Yp+pTPP4enAMONpxOmnOo?= =?us-ascii?Q?fPn3RWJiy13iDlTrQLTgZo6e/RvN+5paXWGd0B7yGrssYafhHGM/jqbhdE5o?= =?us-ascii?Q?tuPRMxq52lRq91CzHz1ngSf+JVE8PkrKNdpfC8lFPKRY6YkQDkKaKVEMgrH0?= =?us-ascii?Q?poXo1QjC8uL0qqCYpel16Qbq4Dsq+EnXTICVZgv6JoHVrRfSxuzdWILMY/bj?= =?us-ascii?Q?BL//JeH/Xd5/+iQP8lqF2rpA03IrDd+AHEYvF/6IBJ94uH0fhVn2r/L8ndHA?= =?us-ascii?Q?wFO6t3oCenj+zgv5o6DNkpiQgkFBKnvcwK4HumjjyGe4nBJdPbQDZPjKEfvx?= =?us-ascii?Q?yM7K1/ZLwcmpm1wDxUz0YssL3LYXIHI6cybtUU9tG7c+Qmthh+jq88Ef6U6Y?= =?us-ascii?Q?0dCHsyJpgqZWvT+cEVC7l03aj5r51NVcobhDled04kubE0ACj+EL5ytETMKo?= =?us-ascii?Q?AspaTyEmPJM/boJE+o/J9CEybaxf62QBgsYbQWw0SpkcNtO/NEWrnm2EXQZL?= =?us-ascii?Q?sS5MloJGcW9oFVpMI35Zs4R8XVkPNIdX6JGbZ89g+wzc2DcpuCqttA4cOIIf?= =?us-ascii?Q?A5SszrSbAoeF47So7adseu6idxupXyyiJSBc94ZwRkcDlkq/FRnshfYFINDG?= =?us-ascii?Q?O1TNxXJZcA0CqXXL1mG1CT9/cQA43yuhklXHCjRxzU3UuXFlG2EhyKu2SrAK?= =?us-ascii?Q?hJCXjLP3bQn/ejC1z2GCEyxSDaU9vMGj7fKsdb/NEwcn5xVZaFDUS2hMjVVU?= =?us-ascii?Q?+VGcHXKRgmsOEYg9oxqLYLzdt/jwXRBnLrU+KXcLAQXMdweHPzNE52cfLsYj?= =?us-ascii?Q?f/P0GRoCa7JNRuTkHdWF/NKx70h9pxKyAbZaVtkZef7nMROsEyGzOjQtnFOz?= =?us-ascii?Q?t+itp6rwPpw//P8qPWWqr2jiF4SDX6M6tKUnQ8uPgycuFI42Cm/hOTz32Qsv?= =?us-ascii?Q?Z6ispjb7poXnrHtk5PSQAaL7a18InSREHv6vapDn+DRIn642HOMN/kPW/Y8i?= =?us-ascii?Q?4HOzTkFJsXYIwNr8FOUR7xEQNjuJO7bJZPTY4a04uV4Q33lZQ0WCxR0fBj9F?= =?us-ascii?Q?3m9wCZBoehIkUMRVy/g=3D?= X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-Network-Message-Id: 057e2b03-aad3-44c1-2cc2-08dc3dfc2027 X-MS-Exchange-CrossTenant-AuthSource: DS7PR12MB5744.namprd12.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 06 Mar 2024 16:40:26.2000 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: SEisqR8PDql6VYEkWunz6z4cdepZsSmbp8tvn0C79beuhhwSvS4lbeAdzWaVyzi+ X-MS-Exchange-Transport-CrossTenantHeadersStamped: DS0PR12MB6536 X-Rspamd-Queue-Id: AA96E14000C X-Rspam-User: X-Stat-Signature: edwnwbafzx7115jwiha8exdmt1hqzzj4 X-Rspamd-Server: rspam03 X-HE-Tag: 1709743230-458300 X-HE-Meta: U2FsdGVkX18jFWCBcvag/cWaTf+Vw4kNdtqWn31t6JoVqjPfYegx+2Aos2+gJ0YKj/2u7PO2KqtVZrI1ClaxVKaMj97zNs86JB32uUOf766Ia1nIRP3lrfGgsHqYljRE4DpcNpko99u4VgLljox1DFK65ejqZCkEZqvjdkHtugiH9TaY+P9T6Os0w9EYmSI85HYeKqCOhV7Lhf17fOu/a5eH+nU7eMzCls8TBdatP/561iwQisvqiZ67W0WqBOtIRex1pmmUFZeiTxWaf1gLLzrceq7fjAmgCN+GJaE6/7G8KnNsX2Pa0nVvNLukBDpOqIWOyxRx1Cp72wxQv48khyQyaG17KBQ9pnNVoDdkqfbhcKl2YKG0wrURMrPi19XtsX2SL7T8+j+ljGp5kspPzYSD6cOMhMqomGJca11RwyssbSaPi9d09dqeTFDh0+LNv/29SQZ91ObIF+YeKC665mAY/SJOdX3kD/G74gAXm9sS/6Bfj80F7HCpbmKXcUn1fdVicf2d42EDNVAVCIA0gW/T/JhJ+L1BxgWmTG3Ka+HdLLLcipLCJl/Kp1KtmLnoh0PgGg7YgaDgc1U1Y8VQaVJ8nCKNKgro1LojXGr2yZQp8HMfLjL0+mgAIWpiSt1nnKmGXvHxsS6vALeA84kHvGbuJsIg6hDt0kGtCqHZodxYcmiP9xjdB55Zs5PWxUiaQQ5OUojFiXLu9Fhy96Dwn11jh4fTMdmB44oBgXrmM/K//jhal/aSgLRdd6V7v+/UJoNE1pRipSfb/mN3TduA7VPUVC4QSiq5lGqt2+CAhqUhe0V6uh6oR1TGlsZog8iTpWPYrW9N5t+XNOkgSDtqgUChvfZMIaL81/jSGPThfLPMjnEGQp1RztUaf6Di+rQCpUY4qaFPVIbblzenN80t49ZqccOtIOXedWa1uUdOo1nxRn92vVmwTYkAJR5umVQsyHINpAIaUDb44Rp23Xd utuVjTg7 rpZG5A6Q8WTxEqRUFAA4WFQHyMkv/qttLy+HjAl6JV5j0GbqOKm3YHqgUPg1ItRQfrAYHmtzCxsQrJZHfeT0a2iZb2HbmVmf+oSGHIot9FcdD4S/KyFTyoL1c+dovd8h+FvX8rvVrdVUCP2IVV8R/qjcNbLPkxhdW5AmcDDt0mFv+z2gGlIgRGOxHCfJ452Kg1OEcK0G1RCLc+YIarquhvaVXPTylPGuiqnopuoFGlYBzDthePsCp/ktctFKmtGuxqz6GF69ScR67iRA2YfpTr7Sfk5Ha7XD1zoME34nnjplCvPzeWubiwUdHGmnD4mSiRqG+GfYFjFI1/Gaa8YfnEAq9CvkAghnfssfE X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: --=_MailMate_F03DCF4A-CFB7-4414-B6F9-F1F6980D2303_= Content-Type: text/plain On 6 Mar 2024, at 10:51, Johannes Weiner wrote: > On Thu, Feb 29, 2024 at 11:34:32AM -0700, Yu Zhao wrote: >> TAO is an umbrella project aiming at a better economy of physical >> contiguity viewed as a valuable resource. A few examples are: >> 1. A multi-tenant system can have guaranteed THP coverage while >> hosting abusers/misusers of the resource. >> 2. Abusers/misusers, e.g., workloads excessively requesting and then >> splitting THPs, should be punished if necessary. >> 3. Good citizens should be awarded with, e.g., lower allocation >> latency and less cost of metadata (struct page). >> 4. Better interoperability with userspace memory allocators when >> transacting the resource. >> >> This project puts the same emphasis on the established use case for >> servers and the emerging use case for clients so that client workloads >> like Android and ChromeOS can leverage the recent multi-sized THPs >> [1][2]. >> >> Chapter One introduces the cornerstone of TAO: an abstraction called >> policy (virtual) zones, which are overlayed on the physical zones. >> This is in line with item 1 above. > > This is a very interesting topic to me. Meta has collaborated with CMU > to research this as well, the results of which are typed up here: > https://dl.acm.org/doi/pdf/10.1145/3579371.3589079 > > We had used a dynamic CMA region, but unless I'm missing something > about the policy zones this is just another way to skin the cat. > > The other difference is that we made the policy about migratetypes > rather than order. The advantage of doing it by order is of course > that you can forego a lot of compaction work to begin with. The > downside is that you have to be more precise and proactive about > sizing the THP vs non-THP regions correctly, as it's more restrictive > than saying "this region just has to remain compactable, but is good > for small and large pages" - most workloads will have a mix of those. > > For region sizing, I see that for now you have boot parameters. But > the exact composition of orders that a system needs is going to vary > by workload, and likely within workloads over time. IMO some form of > auto-sizing inside the kernel will make the difference between this > being a general-purpose OS feature and "this is useful to hyperscalers > that control their whole stack, have resources to profile their > applications in-depth, and can tailor-make kernel policies around the > results" - not unlike hugetlb itself. > > What we had experimented with is a feedback system between the > regions. It tracks the amount of memory pressure that exists for the > pages in each section - i.e. how much reclaim and compaction is needed > to satisfy allocations from a given region, and how many refaults and > swapins are occuring in them - and then move the boundaries > accordingly if there is an imbalance. > > The first draft of this was an extension to psi to track pressure by > allocation context. This worked quite well, but was a little fat on > the scheduler cacheline footprint. Kaiyang (CC'd) has been working on > tracking these input metrics in a leaner fashion. > > You mentioned a pageblock-oriented solution also in Chapter One. I had > proposed one before, so I'm obviously biased, but my gut feeling is > that we likely need both - one for 2MB and smaller, and one for > 1GB. My thinking is this: > > 1. Contiguous zones are more difficult and less reliable to resize at > runtime, and the huge page size you're trying to grow and shrink > the regions for matters. Assuming 4k pages (wild, I know) there are > 512 pages in a 2MB folio, but a quarter million pages in a 1GB > folio. It's much easier for a single die-hard kernel allocation to > get in the way of expanding the THP region by another 1GB page than > finding 512 disjunct 2MB pageblocks somewhere. > > Basically, dynamic adaptiveness of the pool seems necessary for a > general-purpose THP[tm] feature, but also think adaptiveness for 1G > huge pages is going to be difficult to pull off reliably, simply > because we have no control over the lifetime of kernel allocations. > > 2. I think there also remains a difference in audience. Reliable > coverage of up to 2MB would be a huge boon for most workloads, > especially the majority of those that are not optimized much for > contiguity. IIRC Willy mentioned before somewhere that nowdays the > optimal average page size is still in the multi-k range. > > 1G huge pages are immensely useful for specific loads - we > certainly have our share of those as well. But the step size to 1GB > is so large that: > > 1) it's fewer applications that can benefit in the first place > > 2) it requires applications to participate more proactively in the > contiguity efforts to keep internal fragmentation reasonable > > 3) the 1G huge pages are more expensive and less reliable when it > comes to growing the THP region by another page at runtime, > which remains a forcing function for static, boot-time configs > > 4) the performance impact of falling back from 1G to 2MB or 4k > would be quite large compared to falling back from 2M. Setups > that invest to overcome all of the above difficulties in order > to tickle more cycles out of their systems are going to be less > tolerant of just falling back to smaller pages > > As you can see, points 2-4 take a lot of the "transparent" out of > "transparent huge pages". Also there are implementation challenges for 1GB THP based on my past experience: 1) I had triple mapping (PTE, PMD, PUD) support for 1GB THP in my original patchset, but the implementation is quite hacky and complicated. subpage mapcount is going to be a headache to maintain. We probably want to not support triple mapping. 2) Page migration was not in my patchset due to high migration overheads, although the implementation might not be hard. At least, splitting 1GB THP upon migration should be added to make it movable, otherwise, it might cause performance issue in NUMA systems. 3) Creating a 1GB THP at page fault time might cause a long latency. When to create and who can create will need to be discussed. khugepaged or process_madvise are candidates. So it is more likely to have 1GB large folio without much of the transparent feature. -- Best Regards, Yan, Zi --=_MailMate_F03DCF4A-CFB7-4414-B6F9-F1F6980D2303_= Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename=signature.asc Content-Type: application/pgp-signature; name=signature.asc -----BEGIN PGP SIGNATURE----- iQJDBAEBCgAtFiEE6rR4j8RuQ2XmaZol4n+egRQHKFQFAmXonHgPHHppeUBudmlk aWEuY29tAAoJEOJ/noEUByhU2qQQAJMj3CB1z+9u2MPfJm8l/H0ktnPl7JF5k0dN eG3Becq3h74U2eUg1qxxK25uYATljRHsBA2lLN6bXLE/jzy70q4I0EctmumtpAsC MoLbCnz1MpqAbWPqikiFQg3N61N5EBqkgUbaoGZWiST5t1BEOAg7iznEHdEQiBli p9Punu1RDTnwl3qyJVgZB2Ba3dv/21WVtTgPM8k37AK/j5ZEDOwFeRKFcX3OOTsW I2zwK9+eLAVVj4bzAZIjaLxejwSwzAGLDyTDQNUBExQsYoynFYfuLrGBzJKwlvPM s9wNpdbUyHpkHYu/lfe/pv/rfJw2Aj4miSQxOXBgcD+lJXznG2+0BjRtCxHFWrBi PLW+Mz8RY/Rxv1g2uzE1ZpZxUZLJSIh1uvLLOOKiwR1N7IfUuXwnXi4s2NPe5ozo 85JGWXSIo3wlF4csbX3MKyVu0y9LHZ0HdK36ItPhxsEoILnpmIrm070hYAAKmXWG uk3qDKCOtYMYnO8TXQlaLWCr2xTZlr/1ufMrfxbFAcZ6Cxz0egD4YuSkqgHFEbth imw8rsUYKNR2dvFe6n/DsXMYkwq3o6fY3bwcXIaR3gYC6LT8Lj0pEkCO7P46ZWM/ /MMWJL0VoEA2bEqNXtkVsT83QHzdjnLMXIxS/PP6Ms0ppBGoUZ9+iuaVyleGAGgd jRbqryf2 =Cdmw -----END PGP SIGNATURE----- --=_MailMate_F03DCF4A-CFB7-4414-B6F9-F1F6980D2303_=--