From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3D2A6C636D4 for ; Wed, 15 Feb 2023 12:38:19 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C4A416B0074; Wed, 15 Feb 2023 07:38:18 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id BF9F06B0075; Wed, 15 Feb 2023 07:38:18 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id AE8786B0078; Wed, 15 Feb 2023 07:38:18 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 9AC896B0074 for ; Wed, 15 Feb 2023 07:38:18 -0500 (EST) Received: from smtpin15.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 7A2D0C0590 for ; Wed, 15 Feb 2023 12:38:18 +0000 (UTC) X-FDA: 80469479076.15.FF50116 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by imf26.hostedemail.com (Postfix) with ESMTP id 6DEE0140011 for ; Wed, 15 Feb 2023 12:38:16 +0000 (UTC) Authentication-Results: imf26.hostedemail.com; dkim=none; spf=pass (imf26.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com; dmarc=pass (policy=none) header.from=arm.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1676464696; a=rsa-sha256; cv=none; b=1pwpitpSH0+0w4buyfc6tkYqZqJU6Evkd/4ahb8hhxJYWRpnTYKAgntdPfUxzkQcyp+tsg /29UbTYyVI1sL5S7tHLporRk9hQd+yrEQwhIvUNqMrzcKWjwxVxZiNAiV9iBzkkwYPB8ik JpRfvEDCrFz4EgIGkQvcChAqlOMQSw0= ARC-Authentication-Results: i=1; imf26.hostedemail.com; dkim=none; spf=pass (imf26.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com; dmarc=pass (policy=none) header.from=arm.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1676464696; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding:in-reply-to: references; bh=rKT/SNlwbyx/T5tP1DVayORFBc4e2VFeAPdKpKHlPMg=; b=pno4qJeAK7Q4o9LopCv3HjRYORKrttwFaloLVJCruUNstNqVsA5/lc0zHwsTIjBl1jp3Jt Cbizfhg4gHuwiAL9CflUPx7xw2CoEWiCuYJ7TZ6avb7+VRivzwS+8wvEgCEzKJ9zOh7pWR MebYNsumHVXFwG0YBmKZlEZ+bIsaoqQ= Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 14E15FEC; Wed, 15 Feb 2023 04:38:58 -0800 (PST) Received: from [10.57.77.142] (unknown [10.57.77.142]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 9F7C23F703; Wed, 15 Feb 2023 04:38:14 -0800 (PST) Message-ID: <4c991dcb-c5bb-86bb-5a29-05df24429607@arm.com> Date: Wed, 15 Feb 2023 12:38:13 +0000 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:102.0) Gecko/20100101 Thunderbird/102.7.2 Content-Language: en-US To: Matthew Wilcox From: Ryan Roberts Cc: Linux-MM , Catalin Marinas , Mark Rutland , Ruben Ayrapetyan Subject: Folios for anonymous memory Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Rspam-User: X-Rspamd-Queue-Id: 6DEE0140011 X-Rspamd-Server: rspam01 X-Stat-Signature: bmunhez6ti594bw3siys5qm4gtrof6jq X-HE-Tag: 1676464696-716827 X-HE-Meta: U2FsdGVkX1+XKjRgWG6DCMddJggFagriwqv6p2pVm8lEJoaGoCzlSJR+ed7PU3vMkFf5forca2NOrwzMiJRIaodbiqogo7dxC29H18coGRxgHFiNUzANIVbOqWvikWO3Sm+xI8XoVPgvNjKLtkoVB1tvY7kNYs2Y9f6OFOamKHPfpu7UQb5KKpqwmQRbMNnJcOFtT8H2twqBIFBKRB0hj5Eh6HrC2MagJXGfbTBewoHUSefeAtAsR5rXJLyjjv8aOEXP2jKObHO2ZTyoaqxBqfp2VrodN2pCLidZiDo0Q4psViCPpeS1MgJhD883mm/weAEmBdWOmuO+oNJC9hPXBIA44Zyn8fIDnKhm575Mloj6kGjVcToqjKQB584CS9jMb52L6aN3C/Wul+xRsSgD+0orbg28xoTI/5NYp53+UZsLw/yarTHdWpHU7VfQ49q04K6wruyhUb+NwPGHA0xseqhAkOioavtcKD1GpB8QXmtyxb36v1aq2r6T1B5g+i6YOfO0E+kzS7ErIfgFyqvv6sE5CNEip3vM8dOvPaiTZdeCxfLpOTMo7Puu7bJ0zlV1C9t4si3cVuL13l9sGc5noF4iLbyACcSEEHtBC5BmQSWq4R1gBqRfWEdyvzSt1mSAl3qdJUElkmDLpWPV5LHLq2YXuTjP9rozKvOtIzPlleYOYbHpsBHcjOtnKf2W7pvA9uPx6EUSsk5NG6uIhHTu/KMht+Ny+sW6l125rLY+WYb1F7QlmBuNscUymLrwjFRRm20GTFfCMF7gX/hG37mMt8/IGZE3LflLBUtAf7FgYGcbCoqXmMl9JU0JNTofnc4TAwaX8PvioM9ZOAsPGJuu9AeQxoY2k8O8qK77xc5F7HSeZ+36FFbZLruEPaZS4StVy8wJIpjZue1SvAKstQlaZG52shdlvXIXpT79HhyNJPMZi4K1V6ivpDeAnupfjuKJovlpH7mdqgd9aTw/8Ix GUZ6RS+h +WA9pkTkk4q196jx9aL/QjcLlC3a9551tD7JZZudrLpx1iWOptlUGguU1VWXRuiHuq+YIn18thQNrtoy17AUYSqRx6OQ09rX1XLsXqgKJmltyd6NhUcQUBC8i3Q== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Hi Matthew, all, I’ve recently been looking into some potential performance improvements, and think that folios could help with making these improvements a reality. I’m hoping that you can answer some questions to help figure out if this makes sense. First a quick summary of my bench-marking; I’ve been running a Kernel Compilation test as well as the Speedometer browser performance benchmark (among others), while trying to better understand the impact of page size on both HW and SW. To do this, I’ve hacked the arm64 arch code to separate the HW page size (4K) from the kernel page size (16K). Then I ran 3 kernels (baseline-4k, baseline-16k, and my hacked up hybrid-16k-4k) - all based on v6.1 - with the aim of determining the speedups due solely to SW overhead reduction (baseline-4k -> hybrid-16k-4k), and the speedups due to HW overhead reduction (baseline-4k -> (baseline-16k - hybrid-16k-4k)). Results as follows: Kernel Compilation: Speed up due to SW overhead reduction: 6.5% Speed up due to HW overhead reduction: 5.0% Total speed up: 11.5% Speedometer 2.0: Speed up due to SW overhead reduction: 5.3% Speed up due to HW overhead reduction: 5.1% Total speed up: 10.4% Digging into the reasons for the SW-side speedup, it boils down to less book-keeping - 4x fewer page faults, 4x fewer pages to manage locks/refcounts/… for, which leads to faster abort and syscall handling. I think these phenomena are well understood in the Folio context? Although for these workloads, the memory is primarily anonymous. I’d like to figure out how to realise some of these benefits in a kernel that still maintains a 4K page user ABI. Reading over old threads, LWN and watching Matthew’s talk at OSS last summer, it sounds like this is exactly what Folios intend to solve? So a few questions: - I’ve seen folios for anon memory listed as future work; what’s the current status? Is anyone looking at this? It’s something that I would be interested to take a look at if not (although don’t take that as an actual commitment yet!). - My understanding is that as of v6.0, at least, XFS was the only FS supporting large folios? Has that picture changed? Is there any likelihood of seeing ext4 and f2fs support anytime soon? - Matthew mentioned in the talk that he had data showing memory fragmentation becoming less of an issue as more users we allocating large folios. Is that data or the experimental approach public? Thanks, Ryan