From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id DE4BFC3064D for ; Tue, 25 Jun 2024 18:22:40 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 580E66B00A7; Tue, 25 Jun 2024 14:22:40 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 5302D6B00A9; Tue, 25 Jun 2024 14:22:40 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 41F2C6B00AA; Tue, 25 Jun 2024 14:22:40 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 23F166B00A7 for ; Tue, 25 Jun 2024 14:22:40 -0400 (EDT) Received: from smtpin23.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id C7ADE404C5 for ; Tue, 25 Jun 2024 18:22:39 +0000 (UTC) X-FDA: 82270231638.23.7AF9E2F Received: from gentwo.org (gentwo.org [62.72.0.81]) by imf21.hostedemail.com (Postfix) with ESMTP id F21981C001E for ; Tue, 25 Jun 2024 18:22:37 +0000 (UTC) Authentication-Results: imf21.hostedemail.com; dkim=none; dmarc=fail reason="No valid SPF, No valid DKIM" header.from=linux.com (policy=none); spf=softfail (imf21.hostedemail.com: 62.72.0.81 is neither permitted nor denied by domain of cl@linux.com) smtp.mailfrom=cl@linux.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1719339744; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=A5NSmUZKBD/iQzDjTZczgaz3zXr1pcdqhZqaci6O3E8=; b=4xCBDh2me3CjFtosa4dHSYW7fftQ/PzO0W1aQb2Cs9bJqUcI08QF9Z/evuMu6evIo/Op7V IoqlQBkXTWNmpKxp8zwfoibK9lifpXEr18KAd3NDU62ulm543CgWoLzFeb/S9SRc0FHVG1 qqAJ7B88saB51+dN8CnS80O30ChHugw= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1719339744; a=rsa-sha256; cv=none; b=zA3Lajkehcik0OSsIJV/WPQIQ8zM+VscIX09Fy8RrcV8KkslliWmdv7AOzO74vcVYoiDfL PJoxgBXFTxPjM+zjtgA9kYCNJ/z1dNhkA5t1wq441riHwHkxjzUj1snRA8U5yICMZuNtDN K3yR4iRpkaudYSR+OiiJ/mhuS1i0Dpc= ARC-Authentication-Results: i=1; imf21.hostedemail.com; dkim=none; dmarc=fail reason="No valid SPF, No valid DKIM" header.from=linux.com (policy=none); spf=softfail (imf21.hostedemail.com: 62.72.0.81 is neither permitted nor denied by domain of cl@linux.com) smtp.mailfrom=cl@linux.com Received: by gentwo.org (Postfix, from userid 1003) id D32094093E; Tue, 25 Jun 2024 11:11:46 -0700 (PDT) Received: from localhost (localhost [127.0.0.1]) by gentwo.org (Postfix) with ESMTP id CF3174022C; Tue, 25 Jun 2024 11:11:46 -0700 (PDT) Date: Tue, 25 Jun 2024 11:11:46 -0700 (PDT) From: "Christoph Lameter (Ampere)" To: Ryan Roberts cc: Yang Shi , Jonathan Cameron , lsf-pc@lists.linux-foundation.org, olivier.singla@amperecomputing.com, Linux MM , Michal Hocko , Dan Williams , Matthew Wilcox , Zi Yan Subject: Re: [LSF/MM/BPF TOPIC] Multi-sized THP performance benchmarks and analysis on ARM64 In-Reply-To: <7a8bcd48-47b4-4bc7-a38f-45cef9adc221@arm.com> Message-ID: References: <20240401191614.00007c83@Huawei.com> <145031ae-1d4d-4b43-b2c9-aed0d10e86ca@arm.com> <7a8bcd48-47b4-4bc7-a38f-45cef9adc221@arm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII; format=flowed X-Rspamd-Queue-Id: F21981C001E X-Stat-Signature: uborpbadzffb83qfptfkcmkwukaxnq8a X-Rspamd-Server: rspam09 X-Rspam-User: X-HE-Tag: 1719339757-438683 X-HE-Meta: U2FsdGVkX1+aX3ytfgL47CByiSpXiviVkStFTFWUZhJD35g3XL0Pml1478IAMzSc9Hi3l0q8gpforoyVPRypuqMBKL/T4OxE3c6yV8mXhoCXJ9xAh+BS04gBmBZeD+jtldNNFDb4CMpI3x39lXMsQgfPFZssME5bsaOzn35fq1zvXsXcVhghjhLV4qD6WolbD4WcRHUyLX/XfrvS8agBeV4nl+9I4NmhFVIeQYYo5bA8Oq8pDXubwuTvce5TzfpPSbghKJM1ee1uWJIAV44vM5tnlfguGiENrXjmobo9wtL+awkn8xWjLqnvgr82cM1EmS/I2Kj/db3GO8wmus44RFAp9cKdiICLL0YVeu5BKAVTI4ucDEx5MWFAC9t8PB0T+LGDOcpxOnu7JtkDza74I/xZTR0Ax5bPMYzjMWEFFAvSI2/MFJTR50mdbQgwt9N6rwIlfTQ4B/ku3cLpMHxwamRRDfJsiwfWltWdsON8mgk7wMusqP/ta19MnfcXDvQbOPzuS2J8OMQgdueEroVeur1FMt1pPDXItHRaqyydXr6wGaM7JcdSdegIl4ar5c0AKErl7tukeXhJrM15YKUXD3xka8wSWt5Z2GfBVIdNs4FcGBw5Pz1oEcvK595cjeLbdAo1A9m/tzb7GssSpH2wOknissc4S8n+Dzh8QCiOwprU4Nh1qvBFlRN9miL/i9TQp24HqfVwbgO3Dof2Ng2tkvWvLPyhDgdVvOp48Jq+h9uKBlBbtEcQJGIvMw8CRvuXC5fd2FC6jokd0PWtOBu9qVA+H08E5dQE5bhnw13ua4ZYUcljM7l82WasKNvRAkt/dpTGHO7eQ0vgMRuZw6euzFSNVPXCfQj6Ms3U0gJTdpkbGJifSXLBiOSVvMDjK2ktcR6+FBwL3BndhWzoZQyfpJacm168vRGZA0gRrxm6IAoEPHBeaJBPztp70wcQLfZE7bcZeufFrgt9/PUN4Zs +r8s/QQX ryi9VrDcjm5ZsfgBTJ6sL+VjxcCQ82pNDX79w7pctGGo2BQ0v4LZDNS0o1EMDnaMwxKfwnMc3o4UXvsE/r7XuUW6wofIj0rxFsYMVV7wtp7G+BMVqYgpHImbeZqr0qRM6IIRSgwabfv7v29T8RckWgOvwfhCUjlDxJi7SzRy+rpUDXAChQLVaVTpMeg4BmG3KRbhuiu9O3uZne0ITY6zz35K0vbODmSA5TblakKKh8ptanoqEmekKfiYGpbEHNBjgptV5iaLmkAuJbYq37wGOK1xUPDyi+cwHgQcD X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, 25 Jun 2024, Ryan Roberts wrote: > But I also want to raise a more general point; We are not done with the > optimizations yet. contpte can also improve performance for iTLB, but this > requires a change to the page cache to store text in (at least) 64K folios. > Typically the iTLB is under a lot of pressure and this can help reduce it. This > change is not in mainline yet (and I still need to figure out how to make the > patch acceptable), but is worth another ~1.5% for the 4KPS case. I suspect this > will also move the needle on the other benchmarks you ran. See [3] - I'd > appreciate any thoughts you have on how to get something like this accepted. > > [3] https://lore.kernel.org/lkml/20240111154106.3692206-1-ryan.roberts@arm.com/ The discussion here seems to indicate that readahead is already ok for order-2 (16K mTHP size?). So this is only for 64K mTHP on 4K? >From what I read in the ARM64 manuals it seems that CONT_PTE can only be used for 64K mTHP on 4K kernels. The 16K case will not benefit from CONT_PTE nor any other intermediate size than 64K. Quoting: https://developer.arm.com/documentation/ddi0406/c/System-Level-Architecture/Virtual-Memory-System-Architecture--VMSA-/Memory-region-attributes/Long-descriptor-format-memory-region-attributes?lang=en#BEIIBEIJ "Contiguous hint The Long-descriptor translation table format descriptors contain a Contiguous hint bit. Setting this bit to 1 indicates that 16 adjacent translation table entries point to a contiguous output address range. These 16 entries must be aligned in the translation table so that the top 5 bits of their input addresses, that index their position in the translation table, are the same. For example, referring to Figure 12.21, to use this hint for a block of 16 entries in the third-level translation table, bits[20:16] of the input addresses for the 16 entries must be the same. The contiguous output address range must be aligned to size of 16 translation table entries at the same translation table level. Use of this hint means that the TLB can cache a single entry to cover the 16 translation table entries. This bit is only a hint bit. The architecture does not require a processor to cache TLB entries in this way. To avoid TLB coherency issues, any TLB maintenance by address must not assume any optimization of the TLB tables that might result from use of the hint bit.