From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id BFBBFCFC29A for ; Tue, 15 Oct 2024 11:49:00 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 467986B0088; Tue, 15 Oct 2024 07:49:00 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 417A86B008A; Tue, 15 Oct 2024 07:49:00 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2B8696B008C; Tue, 15 Oct 2024 07:49:00 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 09B2D6B0088 for ; Tue, 15 Oct 2024 07:49:00 -0400 (EDT) Received: from smtpin10.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 5044DAC510 for ; Tue, 15 Oct 2024 11:48:42 +0000 (UTC) X-FDA: 82675664988.10.3D79E49 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by imf16.hostedemail.com (Postfix) with ESMTP id 86B5C180011 for ; Tue, 15 Oct 2024 11:48:50 +0000 (UTC) Authentication-Results: imf16.hostedemail.com; dkim=none; spf=pass (imf16.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com; dmarc=pass (policy=none) header.from=arm.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1728992794; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=sE2NGUI9NP35n5vgXCOqwW0ucOguTA4YPPjej5+2KRY=; b=d7UoMG2nj5MaEqYe2UyyRR3i77zBNoUmkMko8XPKQiJdEKBEG6tnCH3TPY1LeXY7xSZuAP XRnkyuLjDWIV6d1kCK9WM9v3UpSrMbCg3OO3bWFOPtVIm38wctMZ8zg6Yt7IUmVbXPpqUe IPANq6rJ6d+FLLh/rvDKjZUDaNT1ozM= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1728992794; a=rsa-sha256; cv=none; b=7xbZ5EAsYgFptMTM4tHixukRHdN6E3kc0COGw3cVEaPR2g84VMO50ATYuGx4wCTBnSJzG1 Wq+n264wdweAZX0k872jr9fyKvezN8P3Nv/koIggojYvX1vu8VYqrkhtKqY/1/tZThncCJ UI9D59t/OdEGEZuF8DP/TLOAZ1Q30Cg= ARC-Authentication-Results: i=1; imf16.hostedemail.com; dkim=none; spf=pass (imf16.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com; dmarc=pass (policy=none) header.from=arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id A9E89DA7; Tue, 15 Oct 2024 04:49:26 -0700 (PDT) Received: from [10.57.86.207] (unknown [10.57.86.207]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 7BC853F51B; Tue, 15 Oct 2024 04:48:53 -0700 (PDT) Message-ID: <23d2f123-ac9f-43b6-9b6e-8a77ea3b9044@arm.com> Date: Tue, 15 Oct 2024 12:48:51 +0100 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [RFC PATCH v1 00/57] Boot-time page size selection for arm64 Content-Language: en-GB To: Florian Fainelli , Andrew Morton , Anshuman Khandual , Ard Biesheuvel , Catalin Marinas , David Hildenbrand , Greg Marsden , Ivan Ivanov , Kalesh Singh , Marc Zyngier , Mark Rutland , Matthias Brugger , Miroslav Benes , Will Deacon Cc: linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org References: <20241014105514.3206191-1-ryan.roberts@arm.com> <3e742298-2f38-496c-ba63-1e30d16318c6@broadcom.com> From: Ryan Roberts In-Reply-To: <3e742298-2f38-496c-ba63-1e30d16318c6@broadcom.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Rspam-User: X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 86B5C180011 X-Stat-Signature: fkh7nfnmozr85bmw3myy31iituuh6w88 X-HE-Tag: 1728992930-164431 X-HE-Meta: U2FsdGVkX19Fk7gFLy1wVY2P2xkd9HwnMMCx4Or+mn58hNvk7InrCPfgGwli4NMLnM5jOyisNfuaL6rVW00vMeG5vSm8u+gHb4ubhPjiZVh8QtWoKul6lZJM1s+1yMWAiVfOcw10atV40zKdvqwdfO5lmHWajD/LGbyaIG7qY0xLMB5Fd8yr1+O4JoFacKrn0csWt+1I7yrhOKtHUd805ykl2MAnt7GXorTiTnWub1tp8tlro00c7IWgdf4Ok2ylMFM0czPpNdHKXkxW+xiXP27QslzaHqGowRYVBpbQMQ2aZ4uqQr6Uz6r4nQbhooYLRPt26pKiDtg22KCePadM4JezY8GWDsnIxOIhIVPL1UUC7yp1ootNUOm3274tV7dkfc7lb+U0RKEIgKjWgmi9tqqIXzUDATW0BSsG62MKqXLqM/dBbLT80H5GDcFlsWPhUGMIL21bjG76AzL16Rbks3WZJMP0FWxo0/sSVZveB02Ll/z14y7mAeN9AlOWx07C1i/JrpEfSQ0IsAbqRGku0GrKu62i6XyWITHLvsMgq/0TNpYpzQsjchoxTiQXsq66pQ3C0AATWG6zi3kJSvk8ptL2FKqpUPe5VWJuUkRHf2FgOkWqZA1wHPRcPRFjGvJ5iWparPcwES0hivRTnVvlff01UV96G1WuOjiZ29fbC8xcyaat8f7bouitdcPSEKm5P5FYzPBh4xAcIb7wECx0/EviQb6wcabNeBOHd3C5RY/2RQO2L1tgr9PrwW7/+eeduP6PJQgfoCY1ry+Hb/kNVi/QzBHMjJW0kLlmF6izH4Co/FUCcWERsZwMoQe+DvwsFi8+Y2JsaDRifyMhI1lbccbkHnStYsrs8mGtKPmGjuBwcAo9QiMBoIgiLsUvDV+PYC5SoQgEsOdi2Xk/+soSCN4vG4r6uGAH0RBHZiF7zeQcPi0+y+WyJGenuL+Gxntk0BEK3/FctAjodmT5209 dWUeT7/3 yEYSdI1SeKtQQCY5yTknEzcJh7nQQ+Tj2/cEuANoPuN8NqMf7xsX6d6XOh92yyppqlp59U1JBahq+2bLVNt6Ugp+hh/RlDLKVUvSJpvULIg+K2DzXMHZFMDP4ypue1+1BJrFt+BO6BKtYsv+1kiSgBqd8QnPRn2FNyvYBB6PfCx2WMliJFH3Ji5EUBkD+ugTlk3uWTD2MYzhA74yvrHv/uTrfLSI83bsQYoJW X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 14/10/2024 18:32, Florian Fainelli wrote: > On 10/14/24 03:55, Ryan Roberts wrote: >> Hi All, >> >> Patch bomb incoming... This covers many subsystems, so I've included a core set >> of people on the full series and additionally included maintainers on relevant >> patches. I haven't included those maintainers on this cover letter since the >> numbers were far too big for it to work. But I've included a link to this cover >> letter on each patch, so they can hopefully find their way here. For follow up >> submissions I'll break it up by subsystem, but for now thought it was important >> to show the full picture. >> >> This RFC series implements support for boot-time page size selection within the >> arm64 kernel. arm64 supports 3 base page sizes (4K, 16K, 64K), but to date, page >> size has been selected at compile-time, meaning the size is baked into a given >> kernel image. As use of larger-than-4K page sizes become more prevalent this >> starts to present a problem for distributions. Boot-time page size selection >> enables the creation of a single kernel image, which can be told which page size >> to use on the kernel command line. >> >> Why is having an image-per-page size problematic? >> ================================================= >> >> Many traditional distros are now supporting both 4K and 64K. And this means >> managing 2 kernel packages, along with drivers for each. For some, it means >> multiple installer flavours and multiple ISOs. All of this adds up to a >> less-than-ideal level of complexity. Additionally, Android now supports 4K and >> 16K kernels. I'm told having to explicitly manage their KABI for each kernel is >> painful, and the extra flash space required for both kernel images and the >> duplicated modules has been problematic. Boot-time page size selection solves >> all of this. >> >> Additionally, in starting to think about the longer term deployment story for >> D128 page tables, which Arm architecture now supports, a lot of the same >> problems need to be solved, so this work sets us up nicely for that. >> >> So what's the down side? >> ======================== >> >> Well nothing's free; Various static allocations in the kernel image must be >> sized for the worst case (largest supported page size), so image size is in line >> with size of 64K compile-time image. So if you're interested in 4K or 16K, there >> is a slight increase to the image size. But I expect that problem goes away if >> you're compressing the image - its just some extra zeros. At boot-time, I expect >> we could free the unused static storage once we know the page size - although >> that would be a follow up enhancement. >> >> And then there is performance. Since PAGE_SIZE and friends are no longer >> compile-time constants, we must look up their values and do arithmetic at >> runtime instead of compile-time. My early perf testing suggests this is >> inperceptible for real-world workloads, and only has small impact on >> microbenchmarks - more on this below. >> >> Approach >> ======== >> >> The basic idea is to rid the source of any assumptions that PAGE_SIZE and >> friends are compile-time constant, but in a way that allows the compiler to >> perform the same optimizations as was previously being done if they do turn out >> to be compile-time constant. Where constants are required, we use limits; >> PAGE_SIZE_MIN and PAGE_SIZE_MAX. See commit log in patch 1 for full description >> of all the classes of problems to solve. >> >> By default PAGE_SIZE_MIN=PAGE_SIZE_MAX=PAGE_SIZE. But an arch may opt-in to >> boot-time page size selection by defining PAGE_SIZE_MIN & PAGE_SIZE_MAX. arm64 >> does this if the user selects the CONFIG_ARM64_BOOT_TIME_PAGE_SIZE Kconfig, >> which is an alternative to selecting a compile-time page size. >> >> When boot-time page size is active, the arch pgtable geometry macro definitions >> resolve to something that can be configured at boot. The arm64 implementation in >> this series mainly uses global, __ro_after_init variables. I've tried using >> alternatives patching, but that performs worse than loading from memory; I think >> due to code size bloat. > > FWIW, this paragraph was not entirely clear to me until I looked at patch 57 to > see that the compile time page size selection had been retained, and could > continue to be used as-is. It was somewhat implicit, but not IMHO explicit > enough, not a big deal though. I intended to make that bit clear with the above sentance "arm64 does this if the user selects the CONFIG_ARM64_BOOT_TIME_PAGE_SIZE Kconfig, which is an alternative to selecting a compile-time page size.", but appreciate there is a lot going on here. > > Great work, thanks for doing that! This makes me wonder if we could leverage any > of that to have a single kernel supporting both LPAE and !LPAE on ARM 32-bit, > but that still seems like somewhat more difficult, largely due to the difference > in the page table descriptor format (long vs. short). We will eventually have the exact same problem with FEAT_D128 on arm64. This introduces page tables with 128 bit PTEs. Ideally we would like to support both in a single image, although, we have much more thinking to do on that. But my current view is that this series solves a bunch of problems that makes it easier (PTRS_PER_Pxx and Pxx_SHIFT all become boot-time values, for example, so we can easily represent the different geometries). Yes, we still need to solve the PTE size difference (in our case 64-bit vs 128-bit). I have a couple of proposals for how to do that; the "gold-plated" approach would be to create and use a handle type to represent a PTE/PxD slot in a table. Then increments/decrements would be enforced via explicit helpers that know the size, and direct dereferencing would be impossible. When accessing via helpers we would pass around pte_t/pxd_t values that are the larger size, then narrow then when writing back. Anshuman has a series [1] that starts to move in that direction. If you have any other ideas, it would be good to talk! [1] https://lore.kernel.org/linux-mm/20240917073117.1531207-1-anshuman.khandual@arm.com/ Thanks, Ryan