From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id EDA79D1AD2A for ; Wed, 16 Oct 2024 08:23:59 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 8A5CB6B0083; Wed, 16 Oct 2024 04:23:59 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 82E686B0088; Wed, 16 Oct 2024 04:23:59 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 71BCB6B0089; Wed, 16 Oct 2024 04:23:59 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 530A46B0083 for ; Wed, 16 Oct 2024 04:23:59 -0400 (EDT) Received: from smtpin03.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 37265121B5F for ; Wed, 16 Oct 2024 08:23:50 +0000 (UTC) X-FDA: 82678777020.03.CBF65F2 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by imf08.hostedemail.com (Postfix) with ESMTP id 621A5160007 for ; Wed, 16 Oct 2024 08:23:51 +0000 (UTC) Authentication-Results: imf08.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=arm.com; spf=pass (imf08.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1729066894; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=kvkFdS3pYm9woPmLRVxPzF9ij5jCHaTX7J2NeYcxVQY=; b=caFJYqz7gjLr+CIhwMcGHKcw3IrHLoqKYwLrk+4DducTwn7dzddbMP3i6O+21aqZV6O3VF wl5AM7ATMF0Go3Q5m6jsSqFCRTktmZnDbAU1YWx9heML684EIfqcPCl9HHlTGw5tkX5zre tlP+fIcBTehO1e35HfHpNbBRtQVNCbs= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1729066894; a=rsa-sha256; cv=none; b=Z/5NwokjL3yK0skqIZFTEDHU7TYLsOoD8XzWjq7R0aOkL4HB+1jS20hEcUYGCaZcpSxiuY /z5NW0jAqkLxl82CfZDnLOPkXpJvT1TqGD0DpsUXG80hMqaZ3IpLjZX+oxEUdzpd1hXGxM gPuKyOgGxVij+QYywac+irnoOV1iro4= ARC-Authentication-Results: i=1; imf08.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=arm.com; spf=pass (imf08.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id C6AB4FEC; Wed, 16 Oct 2024 01:24:25 -0700 (PDT) Received: from [10.57.86.207] (unknown [10.57.86.207]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id ACE163F71E; Wed, 16 Oct 2024 01:23:51 -0700 (PDT) Message-ID: <0369a16f-9298-4a38-bfb9-ee7caa95b976@arm.com> Date: Wed, 16 Oct 2024 09:23:48 +0100 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [RFC PATCH v1 00/57] Boot-time page size selection for arm64 Content-Language: en-GB To: Michael Kelley , Andrew Morton , Anshuman Khandual , Ard Biesheuvel , Catalin Marinas , David Hildenbrand , Greg Marsden , Ivan Ivanov , Kalesh Singh , Marc Zyngier , Mark Rutland , Matthias Brugger , Miroslav Benes , Will Deacon , Dexuan Cui , Boqun Feng Cc: "linux-arm-kernel@lists.infradead.org" , "linux-kernel@vger.kernel.org" , "linux-mm@kvack.org" References: <20241014105514.3206191-1-ryan.roberts@arm.com> From: Ryan Roberts In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Rspamd-Queue-Id: 621A5160007 X-Stat-Signature: ggfmxnf6qmjxy57f8eujmi9nyna6g36i X-Rspamd-Server: rspam09 X-Rspam-User: X-HE-Tag: 1729067031-683164 X-HE-Meta: U2FsdGVkX1/BOg1Sfn2SRrIVmZJJxYE6VAJcTSTZ/19oUeOlJQHXJmoISUM9P5vfHCXisZyXoBjUTbCxDf9COzKfvP/h+XGONVy3PWvEJiPP6xqb8hRlyoGtFAtE+7aLIiGaVS2Y75Yd+s6SilTV0iAnnQFVlMNcZg24NJQDXDW7x/9x+8km41pOfNZOwjvZnVLgTQZ4QlSr4kd+UWy0N9KudVF+OHDNoALflyZ7ykidkMxPexA0tdCwnJ4NKTyEwlJa4X6p0qXINF4RwlX5IdxCdNtm/zdOOjjM2mNxemQnOeUhbV3hE0UdF13hFKBYHz68JWtIJQoU+sVw7toHAehmxnRjjn1bMPQP1n5HWTd1395H34nw0pBnfXDsYxO7hpwGZy/icu8Lst9X7FQ0nvB8f0U7PhncjcjrS3LcpqQsmD/FKSwVt+rPYfTe+MxtIkHVm8zK9MwDwefCcEyHM4ZlX14QJ6AxJ4LAdbxZKPXxQ6KhEaq7udHp9BbgKVW+W0w/aY/obp4egdCbYe45GkIbKk9r88i/ExcBClxShFeg7YBTDZl/qxTumVX0gNE7YsxulrNpy9Cz55KKkFLzF8s8sVHPd6dgMMPJ6ecPeWGyjM8vawJJ1wg5roHmZn92nz0comSItr4Yhpkh3baCWFDOVlRQx/A/14xL8z3wcUau3+R+WA+6CNSrrRGe2ICpe4MOQVm0sJEG614KTOnNbE2OYKllpovZpsCnXalGqu5UaUYHhx11Kri9w3tHt1KBdVTAYQ0uRZlGvghJyjN7QVvU5bBL5gISl2bFHhpZ3vICxDagPxD5PAZcLf0qSMiO3Z3gtssDGq+crlJmN/n5Q4Y6e/O8Uf0GRlZDLvi4gTVMNO6eTAZex8KlTvyTSmqSumSbH1yxCtlFbAvKrx5dKz1+K0+fE6H39ceerE5bsRCL+wncaTm83IYtaFeyxtIN34heWvdapseGTyooT0Y IpkzdWEh ehJ8atZiW60mNDA7RUaqa0+u8yENYQLJdrZ9rfzYwrZ7vIEMpV0nHlV/aXTTTjdPw4aMlHSojYOkwtGBbzCZIGMsTqkFP+X5vpP9Asea49+OosTh0hAXt/CQHNJ9MfpNkTfWqZuZKLCYrPYxLnqCZEAaQUlzelGtG0GBMf1LYAZJEnOZ1tt/x1ivad+wXeUdcWkM28cogM03s68DCTm2oAJDOvMPVvup2JW489JOlaPZ+R4c3/O/Vl3/4sK343QM7kRXOaYOJsY/kW98PsWwQHXzgFC9WFqGZFrZz4tsdYxoVd7MFOk0ijk2a5EYDFUpdBY+A X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 15/10/2024 19:38, Michael Kelley wrote: > From: Ryan Roberts Sent: Monday, October 14, 2024 3:55 AM >> >> Hi All, >> >> Patch bomb incoming... This covers many subsystems, so I've included a core set >> of people on the full series and additionally included maintainers on relevant >> patches. I haven't included those maintainers on this cover letter since the >> numbers were far too big for it to work. But I've included a link to this cover >> letter on each patch, so they can hopefully find their way here. For follow up >> submissions I'll break it up by subsystem, but for now thought it was important >> to show the full picture. >> >> This RFC series implements support for boot-time page size selection within the >> arm64 kernel. arm64 supports 3 base page sizes (4K, 16K, 64K), but to date, page >> size has been selected at compile-time, meaning the size is baked into a given >> kernel image. As use of larger-than-4K page sizes become more prevalent this >> starts to present a problem for distributions. Boot-time page size selection >> enables the creation of a single kernel image, which can be told which page size >> to use on the kernel command line. >> >> Why is having an image-per-page size problematic? >> ================================================= >> >> Many traditional distros are now supporting both 4K and 64K. And this means >> managing 2 kernel packages, along with drivers for each. For some, it means >> multiple installer flavours and multiple ISOs. All of this adds up to a >> less-than-ideal level of complexity. Additionally, Android now supports 4K and >> 16K kernels. I'm told having to explicitly manage their KABI for each kernel is >> painful, and the extra flash space required for both kernel images and the >> duplicated modules has been problematic. Boot-time page size selection solves >> all of this. >> >> Additionally, in starting to think about the longer term deployment story for >> D128 page tables, which Arm architecture now supports, a lot of the same >> problems need to be solved, so this work sets us up nicely for that. >> >> So what's the down side? >> ======================== >> >> Well nothing's free; Various static allocations in the kernel image must be >> sized for the worst case (largest supported page size), so image size is in line >> with size of 64K compile-time image. So if you're interested in 4K or 16K, there >> is a slight increase to the image size. But I expect that problem goes away if >> you're compressing the image - its just some extra zeros. At boot-time, I expect >> we could free the unused static storage once we know the page size - although >> that would be a follow up enhancement. >> >> And then there is performance. Since PAGE_SIZE and friends are no longer >> compile-time constants, we must look up their values and do arithmetic at >> runtime instead of compile-time. My early perf testing suggests this is >> inperceptible for real-world workloads, and only has small impact on >> microbenchmarks - more on this below. > > [snip] > > This is pretty cool. :-) FWIW, I've built a kernel with this patch set, and > have it running in a RHEL 8.7 guest on Hyper-V in the Azure public cloud. > Ran with 4K, 16K, and 64K page sizes, and the basic smoke tests work. That's great to hear - thanks for taking the time to test! > > The Hyper-V specific code in the Linux kernel needed a few tweaks to > deal with PAGE_SIZE and friends no longer being constant, but it's nothing > significant. Getting the kernel built in the first place was a little harder > because my .config file is fairly generic with a lot of device drivers and file > system code that aren't really needed for Hyper-V guests. I had to > weed out the ones that won't build. My RHEL 8.7 install uses LVM, so I> hacked the 'dm' code to make it compile and run. Yeah, getting all this sorted is going to be the long tail. I feel I've had enough positive response to this RFC that I should probably just get on and start that work to get a real feel for how much of it there is going to be. > > As this work moves forward, I can supply the necessary patches for > the Hyper-V support. Let me know if you want to include them in the > main patch set. Great! If you are happy to forward them to me, I'll include them in future versions of the series (or more likely, serieses). Thanks, Ryan > > I've added a couple of Microsoft's Linux people to this email's addressee > list so they are aware of what's going on. > > Michael Kelley