From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D83E2EB64D9 for ; Mon, 10 Jul 2023 13:28:23 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 622E46B0072; Mon, 10 Jul 2023 09:28:23 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 5D2EB6B0074; Mon, 10 Jul 2023 09:28:23 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4C2ED6B0075; Mon, 10 Jul 2023 09:28:23 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 3E06D6B0072 for ; Mon, 10 Jul 2023 09:28:23 -0400 (EDT) Received: from smtpin17.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 0C47A1203A3 for ; Mon, 10 Jul 2023 13:28:23 +0000 (UTC) X-FDA: 80995781286.17.6ADB63A Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by imf26.hostedemail.com (Postfix) with ESMTP id EDFE114000E for ; Mon, 10 Jul 2023 13:28:20 +0000 (UTC) Authentication-Results: imf26.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=arm.com; spf=pass (imf26.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1688995701; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=KUXtu5nexv6mvzfbBS+L8cAPVkjFufpMulYBSrcm3fs=; b=coVF1DARyF1KBjrrHDctS+dIdqwiSdE6Ae6zvh4sTrtTCKYbsfPGyTHUOngvR7rqFbdSTb O8oq8fBc3KnuCbDadXlKAlcoCI9yPJSwReUfa77eSOGwkZKNmn2SXjGWY4VbnVbv/d3QZm LjpO1uIPHiI36ytof/U1kUkK9H7TG8A= ARC-Authentication-Results: i=1; imf26.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=arm.com; spf=pass (imf26.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1688995701; a=rsa-sha256; cv=none; b=ArIWDEn2gGqvOT34k4kEN9UIGBRsie9G1BHCgdFna1mdgf2kvZhux6uEc/TcoQCjcjZXi+ KZgsm4b8sCxCX1XuT3O+PaLs9onSYEZTpxU59KhGJiz9siMD/Br35Pf2GqH6M/dfojrTIN 4+SCFStBCiZ7i1HlSQQvxd33cgSz6hg= Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id E7BE02B; Mon, 10 Jul 2023 06:29:01 -0700 (PDT) Received: from [10.57.77.63] (unknown [10.57.77.63]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id AC7BC3F67D; Mon, 10 Jul 2023 06:28:16 -0700 (PDT) Message-ID: <4dc1d2e1-d5d7-2812-aa8b-f8ba6b9fb207@arm.com> Date: Mon, 10 Jul 2023 14:28:15 +0100 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:102.0) Gecko/20100101 Thunderbird/102.12.0 Subject: Re: [PATCH v1 00/14] Transparent Contiguous PTEs for User Mappings To: Barry Song <21cnbao@gmail.com> Cc: Catalin Marinas , Will Deacon , Ard Biesheuvel , Marc Zyngier , Oliver Upton , James Morse , Suzuki K Poulose , Zenghui Yu , Andrey Ryabinin , Alexander Potapenko , Andrey Konovalov , Dmitry Vyukov , Vincenzo Frascino , Andrew Morton , Anshuman Khandual , Matthew Wilcox , Yu Zhao , Mark Rutland , linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org References: <20230622144210.2623299-1-ryan.roberts@arm.com> From: Ryan Roberts In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: EDFE114000E X-Stat-Signature: k4hytaecssz3tmg358mk6rjhdgnybfnx X-Rspam-User: X-HE-Tag: 1688995700-763024 X-HE-Meta: U2FsdGVkX18uXra0bUqpEehX8rjp43QR8G+sMFxxHO7NFlHwPvWB9NhKqedDMKQwLYfyGtqucs53CU5YAWTImlMtLupwtrLGy0M39jkX3g2UxqBkGjHc7WFg6VjtxUd7ipyIiYgzInJ3hAisDpUe5KE/g08I9r5nHHb3dTK0LieeSnmEJAL3/wV/uPmW+tNrJCw+7IZnVhVpbPCQYob2oUC9IcdSegvjJN/KKvFmAwuuMLDx6+CQcxD+nOkIOBQFK3wycmWNoE+ig1cN95JSDjjV7QqEVx6RDXCYH7+uAiwaEGASx7q/l8JodWoD+FoEr+LkN4/9io6P43gBLnOQqsPZzIn6cuPa5zlknj9KqBosdHUAKLdvGw5XLQ69xf4CnikpAcUsokn12OqcSqO2H/pFIvo3ozFcox+LgvOxbX1+yhxeMY4eW7JOnxttaOOZZbRXUkrBp8T8vj+mkZaxMIPOcYyXqXBhC/+1RVJ6lElOhu4zge0rC0RKTP5i6TFAzb7U4eijoUYyK9etc4R7zM+9LqbbUUnu/Qz/GWbmwcjRf8rtAVM85nt+lLzE6W5tj0zbC7lFWhzQqAVNpZ1vYYfvwtZat1jOhPjRr7V+7k7rjanXcAflq78B7XKPEG7d4FcivWzWLORTEVXu4QNliyq0xGVfOvUNbUVRUkMtoNS5jlGX4mEKVyAh1CzwDd9k7XyjIW3ViQ4cnxL779zTIBVXqEshbEUkGwjzjFWwLsYlNTtmNcodGoZapZA/MyPMGxdCTe13ErO8YCejN4z9i2aTlqt9MZyGQbHWYm3a1N5BWURnr+ze9r3cRtl+JuH8J2XkHwYIB/KypW7WrLB4uaYag/NiT86ef4zxW2segBckoePLEoTfCGVsI+6xqNm+BTSaFAxhZvcRxuDRVj2CLIbRt02J504IEuE1mMFMtD2rb3WcAYS/2fHAcKrYqZYfBWLwbMBZ7ZdDDY5y396 xOpKzUcf iTAr2cYilFG88+UjrcdaPyHQ3Nhb5d/tpMKHLX3d+kk+WXlph0h55zh2woHjI5u+xVNjAFgF9QL0goXFh1wwkVF90/sz/NiFJwMNgUawqOcupbUCiUB1mCwP9Cm2owUQeZgyuZoa4m0mVws4PmItiZpoCRZ7FVVfvkIR/ySGxcnvolL4S89aOa7si6RbB1/Uc1zEwGRnEf2EB5zaBWO5GY75bvGh2Qb7qM2UwGpWhEWWv+Fg= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 10/07/2023 13:05, Barry Song wrote: > On Thu, Jun 22, 2023 at 11:00 PM Ryan Roberts wrote: >> >> Hi All, >> [...] >> >> Performance >> ----------- >> >> Below results show 2 benchmarks; kernel compilation and speedometer 2.0 (a >> javascript benchmark running in Chromium). Both cases are running on Ampere >> Altra with 1 NUMA node enabled, Ubuntu 22.04 and XFS filesystem. Each benchmark >> is repeated 15 times over 5 reboots and averaged. >> >> All improvements are relative to baseline-4k. anonfolio and exefolio are as >> described above. contpte is this series. (Note that exefolio only gives an >> improvement because contpte is already in place). >> >> Kernel Compilation (smaller is better): >> >> | kernel | real-time | kern-time | user-time | >> |:-------------|------------:|------------:|------------:| >> | baseline-4k | 0.0% | 0.0% | 0.0% | >> | anonfolio | -5.4% | -46.0% | -0.3% | >> | contpte | -6.8% | -45.7% | -2.1% | >> | exefolio | -8.4% | -46.4% | -3.7% | > > sorry i am a bit confused. in exefolio case, is anonfolio included? > or it only has large cont-pte folios on exe code? in the other words, > Does the 8.4% improvement come from iTLB miss reduction only, > or from both dTLB and iTLB miss reduction? The anonfolio -> contpte -> exefolio results are incremental. So: anonfolio: baseline-4k + anonfolio changes contpte: anonfolio + contpte changes exefolio: contpte + exefolio changes So yes, exefolio includes anonfolio. Sorry for the confusion. > >> | baseline-16k | -8.7% | -49.2% | -3.7% | >> | baseline-64k | -10.5% | -66.0% | -3.5% | >> >> Speedometer 2.0 (bigger is better): >> >> | kernel | runs_per_min | >> |:-------------|---------------:| >> | baseline-4k | 0.0% | >> | anonfolio | 1.2% | >> | contpte | 3.1% | >> | exefolio | 4.2% | > > same question as above. same answer as above. Thanks, Ryan > >> | baseline-16k | 5.3% | >>