From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id AAADFC4332F for ; Mon, 13 Nov 2023 10:19:56 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 3AC978D0026; Mon, 13 Nov 2023 05:19:56 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 35CE98D0001; Mon, 13 Nov 2023 05:19:56 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 224938D0026; Mon, 13 Nov 2023 05:19:56 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 128FD8D0001 for ; Mon, 13 Nov 2023 05:19:56 -0500 (EST) Received: from smtpin25.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id D0696C0181 for ; Mon, 13 Nov 2023 10:19:55 +0000 (UTC) X-FDA: 81452535150.25.2C8156A Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by imf21.hostedemail.com (Postfix) with ESMTP id AD88E1C0010 for ; Mon, 13 Nov 2023 10:19:53 +0000 (UTC) Authentication-Results: imf21.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=arm.com; spf=pass (imf21.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1699870794; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=aLladUjX6sNGQUVff26NH2Xvn76C44somBngiSyIS98=; b=j3ti4cD1HqmxfJJUxMyfpMMSU+c660jSXL0xpWA3+XBaWTdmMVyiMBD3U9971WMm3E6P73 DiCKwjudVC12bb4hFs2zGl4uD+Svq/mIztzeoMg/dwJcvxB6rtGzaXZTVN5FgfRmHNvkog /ZE82ZuUghGk+QyksjUA1TOVVksbv1M= ARC-Authentication-Results: i=1; imf21.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=arm.com; spf=pass (imf21.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1699870794; a=rsa-sha256; cv=none; b=0iaXEFC/C/GFGtyd5vM+/Sob2WxWTONEDElDnMAvNADnN1kq3t9I6tBoyL2x4zToOg5Ac0 kHS4yxW5sEMIF7MbMkovQ+PgbsrcH/z94ogCRKtTU1UKTjSHG3jZIi+flvkcKZkuH+aALW pscw5WZm/W004pPMpPdO9IK0f+W8bUQ= Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id C4C9414BF; Mon, 13 Nov 2023 02:20:37 -0800 (PST) Received: from [10.57.73.13] (unknown [10.57.73.13]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id CE5403F6C4; Mon, 13 Nov 2023 02:19:49 -0800 (PST) Message-ID: Date: Mon, 13 Nov 2023 10:19:48 +0000 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v6 0/9] variable-order, large folios for anonymous memory Content-Language: en-GB To: Matthew Wilcox , John Hubbard Cc: Andrew Morton , Yin Fengwei , David Hildenbrand , Yu Zhao , Catalin Marinas , Anshuman Khandual , Yang Shi , "Huang, Ying" , Zi Yan , Luis Chamberlain , Itaru Kitayama , "Kirill A. Shutemov" , David Rientjes , Vlastimil Babka , Hugh Dickins , linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-arm-kernel@lists.infradead.org References: <20230929114421.3761121-1-ryan.roberts@arm.com> From: Ryan Roberts In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Rspam-User: X-Stat-Signature: ob3pu9otk7g3mf16d89qkndpujtsxwpy X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: AD88E1C0010 X-HE-Tag: 1699870793-951941 X-HE-Meta: U2FsdGVkX18rtRU3J3ySbnqbedFFz898hcFIrciqgQwGCYTbLv1WSFVbmiAX2qUZTXcU2XaA6sSU5mZOskLV9eDvqek7j4YecoV870Hcv4fDSCtLGqKqhVeyVSCBbT3384UVseCFIv2aGzqbZGfPjBADiHw5SB9FocOdSaahuh9mSsZFnLMQiEc/S+tN3gM6KhcJgeZaSpWOuEgd/HjwZvE0LVynIOYxb7z0GTCFHYwlSb2lzHxn3G8MUU0DbXq8Fi6SvLEPAW5sNirUhex80NaAaF4PQtaopAAM5YQv9lMVP/LHJebNe5/hHConPpsGdHMVC3+PFCgp+057pB2rFEBi7xX5tvepZdnkLY0byXzUhoBk8hfHEssGraGgR9+gYJm+qGGb0HfxReicbU8Xygu76t7n/90QXEl2+EI5YIwoyatAecLG2THnf8ETrPT90FJEVQ4qxMlLX5shMqjkKlIaQ2VRPs3wfwZxJ0ZKKHVZPxFyOGH63acdX9pKHwZNeUokMVjXen/R0IPDrQNFZDX/pGlo/9KAFJ8q5SFA3mibYdBW11s2+pFE8vf6+a0cdcC6PR1UuQJMq5oxsyo3htcN236d1Xsvdp0W8fgrZohFsVWYBruQy3nh6UBk3Gz2A6OcAClk/5jObqOsHKKCtMQUxbmaBi+gpykwWwrbMEChy9uHkmoUvHyXPhlFOKT6vqXiO7rCWyAsez3QCQzhQ5TLkQVd4cjq7JxtQdkPk5bNwBKol5N/3kUo6XvSF2IxylkYtXN7Ht26dTpktMhzrESkETpDcmjOj5IinSCLpXyboIUdvWczaFFhdqK8CaLfehON0MZGXqOMYAlUkWsnYgL3Xm66KylGxZjDH59CuWPs+rFzGvsQ6w+TIp64FP5Dwtv1ZuoLRdZGC4RB6l+JoeF9SKqXZqSoTYN0z8iIysB20QiBh0QnoU8Niv33H0jIwIDr6Z+Xcx8+QgzyZIf LpArFcvj D2h7RSdnP/OOnwvOK2P+9xlUomfTw9VoIVegbJqnXEfYu2BN1g228zjTYj+9GXuxJyyKsJMRjJQH3tq55dXEpTDAfdbQs2Av6Lb8oJtP3sWdBwxuvV37qYSSOl5bz1qE56KdhuIyJJOuaIenD3lI4tJmdkxtBTRRzTvRMDV/aqjiufYch0nvDMicfMwKkNI6LsoJokG01cIkJ10BzbCnw6EgRRNPr//aBXRbT X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 13/11/2023 05:18, Matthew Wilcox wrote: > On Sun, Nov 12, 2023 at 10:57:47PM -0500, John Hubbard wrote: >> I've done some initial performance testing of this patchset on an arm64 >> SBSA server. When these patches are combined with the arm64 arch contpte >> patches in Ryan's git tree (he has conveniently combined everything >> here: [1]), we are seeing a remarkable, consistent speedup of 10.5x on >> some memory-intensive workloads. Many test runs, conducted independently >> by different engineers and on different machines, have convinced me and >> my colleagues that this is an accurate result. >> >> In order to achieve that result, we used the git tree in [1] with >> following settings: >> >> echo always >/sys/kernel/mm/transparent_hugepage/enabled >> echo recommend >/sys/kernel/mm/transparent_hugepage/anon_orders >> >> This was on a aarch64 machine configure to use a 64KB base page size. >> That configuration means that the PMD size is 512MB, which is of course >> too large for practical use as a pure PMD-THP. However, with with these >> small-size (less than PMD-sized) THPs, we get the improvements in TLB >> coverage, while still getting pages that are small enough to be >> effectively usable. > > That is quite remarkable! Yes, agreed - thanks for sharing these results! A very nice Monday morning boost! > > My hope is to abolish the 64kB page size configuration. ie instead of > using the mixture of page sizes that you currently are -- 64k and > 1M (right? Order-0, and order-4) Not quite; the contpte-size for a 64K page size is 2M/order-5. (and yes, it is 64K/order-4 for a 4K page size, and 2M/order-7 for a 16K page size. I agree that intuitively you would expect the order to remain constant, but it doesn't). The "recommend" setting above will actually enable order-3 as well even though there is no HW benefit to this. So the full set of available memory sizes here is: 64K/order-0, 512K/order-3, 2M/order-5, 512M/order-13 > , that 4k, 64k and 2MB (order-0, > order-4 and order-9) will provide better performance. > > Have you run any experiements with a 4kB page size? Agree that would be interesting with 64K small-sized THP enabled. And I'd love to get to a world were we universally deal in variable sized chunks of memory, aligned on 4K boundaries. In my experience though, there are still some performance benefits to 64K base page vs 4K+contpte; the page tables are more cache efficient for the former case - 64K of memory is described by 8 bytes in the former vs 8x16=128 bytes in the latter. In practice the HW will still only read 8 bytes in the latter but that's taking up a full cache line vs the former where a single cache line stores 8x 64K entries. Thanks, Ryan