From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 370FFC001DE for ; Wed, 2 Aug 2023 09:05:05 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 65D4E28013E; Wed, 2 Aug 2023 05:05:04 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 5E699280112; Wed, 2 Aug 2023 05:05:04 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4AF3C28013E; Wed, 2 Aug 2023 05:05:04 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 3810B280112 for ; Wed, 2 Aug 2023 05:05:04 -0400 (EDT) Received: from smtpin27.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 0C7F816040C for ; Wed, 2 Aug 2023 09:05:04 +0000 (UTC) X-FDA: 81078580128.27.5206839 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by imf30.hostedemail.com (Postfix) with ESMTP id DEFA280013 for ; Wed, 2 Aug 2023 09:05:01 +0000 (UTC) Authentication-Results: imf30.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=arm.com; spf=pass (imf30.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1690967102; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=5AbCZYso0vzZwoGM7igYOxcRVcoenBXyCDN8X7EIGUA=; b=qyvp/qlD7wo1YU0MSncYBrT/ez1BA8c/Q2ZnsKDd6PJH0ZbyFnonpwb65Xw0eCiB1erRdt aFqLbxEjIcOdyyTs+a+4o72pEf7xEMEtJZC5iHAsBRrxwGu1lfJTz/aT58d3ARs3sP0OP0 c+maG3SyOBoGkd2oUQHw9oIy5W0sXXQ= ARC-Authentication-Results: i=1; imf30.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=arm.com; spf=pass (imf30.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1690967102; a=rsa-sha256; cv=none; b=WbBpFSg8RvAayQ/yk7BMB+VsQb5uXWDitGrm11zxv4I5XnD6+lLjJkFicEv49fxh42nyb/ TvvD9cbOWwcmKMVyl00O5wtbyEtAmWjZ15f4gL9C9eb0xHp1RjFg7CL5P3DtHb5OJBFgG7 eBOfRiEOm/5XbK0cD/BCtxvVXBWmZD8= Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id C4271113E; Wed, 2 Aug 2023 02:05:43 -0700 (PDT) Received: from [10.57.77.90] (unknown [10.57.77.90]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 2B1483F5A1; Wed, 2 Aug 2023 02:04:58 -0700 (PDT) Message-ID: <951a8d96-ecdf-7ca4-ec7a-e1c5eba8bce3@arm.com> Date: Wed, 2 Aug 2023 10:04:56 +0100 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:102.0) Gecko/20100101 Thunderbird/102.13.0 Subject: Re: [PATCH v4 2/5] mm: LARGE_ANON_FOLIO for improved performance From: Ryan Roberts To: Yu Zhao Cc: Andrew Morton , Matthew Wilcox , Yin Fengwei , David Hildenbrand , Catalin Marinas , Will Deacon , Anshuman Khandual , Yang Shi , "Huang, Ying" , Zi Yan , Luis Chamberlain , Itaru Kitayama , linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-arm-kernel@lists.infradead.org References: <20230726095146.2826796-1-ryan.roberts@arm.com> <20230726095146.2826796-3-ryan.roberts@arm.com> <8c0710e0-a75a-b315-dae1-dd93092e4bd6@arm.com> In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Rspam-User: X-Stat-Signature: n6uwi1xyeefopg8s3wadrwq6iabtp8oh X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: DEFA280013 X-HE-Tag: 1690967101-459478 X-HE-Meta: U2FsdGVkX1+zRg7td50zBKrVe8x9CS72Ke/Ffg6GZMz9uXEBf3k20IgV7ovxqprX1hnDFfTNAl7ACGRnhQxqDStOvRC4ID2Z7Tj8JTKgUNgMq1DsxgCFrWQD+CExpbI7UKYv7S62J5hWGMGLH72rQNMqboghxbTdFPHPmKh0B4Rbb608hlp3czQaVQ6KNLDZtkXzL0fH+MrKYx8fplGTmJqPE+eakw9K3c+3Ltwm8FUXmjrjsaP55WdkMraZazpxu2+eEzQAFYlb9eMx43kcFvc0z+u0jilrSuqie9Cm1ms2lh61j5jCqrzdNOapD9npAchWvlLJMR3BEz5Fggsk3hXiJcyTGNpqli/8XueSuLiUi5f0UoIp77t4i4sPUS4ZReus25FE51H50wkK7p6GYEU/q52K1xYAI7VH7y0k1gkm34XLfXZVBzIZ1HLuq5enb/lEr3XjpbrBAnOVO8ZjPnoqjOjL6STG9unyOEJkiYul2oTAFw3/hYGCdMrdIXOmT1vGIouyMyEX/9i8ll0Xh9Qum5FCDgj1Y9aOFT70NmB936a7d9rYNN7B95nkFsS5Hey4iDCXiSny6/fwLhjwTjVPCjWCUgSQDVUd1MKuJX9FvBY9OXGneXV27IYmT0y//CGB6P4C3r5VaFLJKw/cwDkYngbB3WCiTTopmSXFIiNDJIiDwN/3N322EG5JSuB5EkWZ01NnezXCKTG0QTu+JMtATE9i6VQy3S8VQwUFWkXc8hN7Yj27+IERQcxjT5P4yKRAbvWqlVchZ+Ix4rBf+190iE/XUplsgrb0ylWscaP/8nbHspskEhwjwJwfg+6Z+lNFFpQhRgjvK26EhwLqJafrnbLLbvCTxFKRF7/3C+zODhpGEyopMi0q+aP6BarLN670pUT6VMVX6hPsONk3IHLkr+zWYZ5FmJlMqx7A70v45P64S8C8aly3BzOFFOBzgOwU7d3dRA+MexvLYIb pPmkGjtY bKDeBbrOGWV3Ba4HKbzG5v2JzcC9fgNrXC1/MylljgU9zYL5t3iCDngtGG/y6kgJdbbrWe84/IejfDu7h8n2nLvqPTqs0Em6/iBqQyOqBKWr670pQygKkt9xC7ZWym/yRFV3XB7WQcPLVRPpe0d40fGTBuE8KrPw8J4m7A4gB/D/Q4A8ioOHV/PqE1oHM7ut/Kur/AmI9HnmVHjjUz5TH0rAmrbKZmsIzE4e9 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 02/08/2023 09:02, Ryan Roberts wrote: ... >>> >>> I've captured run time and peak memory usage, and taken the mean. The stdev for >>> the peak memory usage is big-ish, but I'm confident this still captures the >>> central tendancy well: >>> >>> | MAX_ORDER_UNHINTED | real-time | kern-time | user-time | peak memory | >>> |:-------------------|------------:|------------:|------------:|:------------| >>> | 4k | 0.0% | 0.0% | 0.0% | 0.0% | >>> | 16k | -3.6% | -26.5% | -0.5% | -0.1% | >>> | 32k | -4.8% | -37.4% | -0.6% | -0.1% | >>> | 64k | -5.7% | -42.0% | -0.6% | -1.1% | >>> | 128k | -5.6% | -42.1% | -0.7% | 1.4% | >>> | 256k | -4.9% | -41.9% | -0.4% | 1.9% | >>> >>> 64K looks like the clear sweet spot to me. I'm sorry about this; I've concluded that these tests are flawed. While I'm correctly setting the MAX_ORDER_UNHINTED value in each case, this is run against a 4K base page kernel, which means that it's arch_wants_pte_order() return value is order-4. So for MAX_ORDER_UNHINTED = {64k, 128k, 256k}, the actual order used is order-4 (=64K): order = max(arch_wants_pte_order(), PAGE_ALLOC_COSTLY_ORDER); if (!hugepage_vma_check(vma, vma->vm_flags, false, true, true)) order = min(order, ANON_FOLIO_MAX_ORDER_UNHINTED); So while I think we can conclude that the performance improves from 4k -> 64k, and the peak memory is about the same, we can't conclude that 64k is definely where performance gains peak or that peak memory increases after this. The error bars on the memory consumption are fairly big. I'll rework the tests so that I'm actually measuring what I was intending to measure and repost in due course.