From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 62A6BC3DA59 for ; Mon, 22 Jul 2024 09:43:34 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id BF1BC6B0085; Mon, 22 Jul 2024 05:43:33 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id B9E6A6B0088; Mon, 22 Jul 2024 05:43:33 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A178F6B0089; Mon, 22 Jul 2024 05:43:33 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 820FA6B0085 for ; Mon, 22 Jul 2024 05:43:33 -0400 (EDT) Received: from smtpin19.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 21FE012166F for ; Mon, 22 Jul 2024 09:43:33 +0000 (UTC) X-FDA: 82366901106.19.5C2D81E Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by imf21.hostedemail.com (Postfix) with ESMTP id 658D01C0009 for ; Mon, 22 Jul 2024 09:43:31 +0000 (UTC) Authentication-Results: imf21.hostedemail.com; dkim=none; spf=pass (imf21.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com; dmarc=pass (policy=none) header.from=arm.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1721641351; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=JbPsNJFXLFTes6KlFxXQI8qEQgH2N9P/1tTj2Mnafok=; b=aQ6Ifo23ZS40V3lXnIvcELnNKCdHQRDDBGvatxJvFMBw96t3EJ9tCyYMXIvHN4RGgvAz14 zzz1O35WHtU6AImD+e0vXPRbILLouA4YfqbwFFa7sTziXV9ylCR71LejSFf+c1NOXwYSTG Pq5PO5Pzo8SeXgynu6mU9RZbXTAYlS8= ARC-Authentication-Results: i=1; imf21.hostedemail.com; dkim=none; spf=pass (imf21.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com; dmarc=pass (policy=none) header.from=arm.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1721641351; a=rsa-sha256; cv=none; b=2oMr/+/gdqR7GcqkOAJPcIpxWomUEi14rV2XNgoHbe6lZfx7ZoKigrdrwqOipdQ6bxU5e4 R04NVX+3kS/2epcCgCPP/gY4gfZQZWgtjUf8bEGPXp3rPU1AJNYNoZRQH/xB2iHN9Gf9+n mpXAV03/qAulx4TBft0ziu0I7t5Hjf4= Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id EA2691007; Mon, 22 Jul 2024 02:43:55 -0700 (PDT) Received: from [10.1.27.165] (XHFQ2J9959.cambridge.arm.com [10.1.27.165]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id AB65A3F766; Mon, 22 Jul 2024 02:43:28 -0700 (PDT) Message-ID: <2950ddfa-32a5-4987-9c05-05ce86a53e17@arm.com> Date: Mon, 22 Jul 2024 10:43:25 +0100 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [RFC PATCH v1 0/4] Control folio sizes used for page cache memory Content-Language: en-GB To: Daniel Gomez Cc: David Hildenbrand , Andrew Morton , Hugh Dickins , Jonathan Corbet , "Matthew Wilcox (Oracle)" , Barry Song , Lance Yang , Baolin Wang , Gavin Shan , Pankaj Raghav , "linux-kernel@vger.kernel.org" , "linux-mm@kvack.org" References: <20240717071257.4141363-1-ryan.roberts@arm.com> <99b33a29-e97a-4932-8d7a-85bc01885d18@arm.com> From: Ryan Roberts In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 658D01C0009 X-Stat-Signature: takokzozro5w3ay49f7grhj45xxmf7oy X-Rspam-User: X-HE-Tag: 1721641411-370500 X-HE-Meta: U2FsdGVkX1+oPLmAw2ZR8jhp1GOZL88xvfTdNObRZGR4BEu/FcIZtnyo9yLHuUVETP9TkiE1NbuBNrvXzwbp2T7ouPuP02nf59wrDcLFwlKvkGqnpIM16MrJ2w1wcQATDTBPDDaCkOs1sddQZsesie52p1Mp2iOCdQFGsDzGywDmPxqmEwyFqzE12DzRf9R0yl//Ktut58prJ3C54UUg8zfoaew75lZACdRuh8/q/RnaGzoe1x5BaVYRFCMmjiTIHZgvNwjSB2WaCvx6kYK75Jk5hpJ1SaxEwxDxnO12sy5L8LeWkGVJjQo7tMcw3ZEmO9rmA4/zmnFO1BYwImaFEGWg1Mjv19wRyg/CugPsH9iKJaE18pmvlbKjr+qKBoEAN5V/Utv3/aii1IEeuXy9IOSyHWjMN0tFFslFaqHkGQdJgusNhHio/w7N29ou99Ly8S5MUrjYYu4FBdio10Yan+kdondJWNuD46YkflLxKI06IzqNQEnxQ1jwQaOJTAXBtWEnQF5kq5fD98aOQZII2DT1s0xiPXo4OZrFsNwShuz0Ly1fs+cCSo80bgoMYn4YB8zxpR1/gqJMi1YOIJAuzuWVFqnXsPcxUwcBI1nBdt1DwVcYmuZqg9ylFpfOtbI5VIKuxOg+zpnpF9uxNdGc0Gck428RHUop9iA3ShD7mH0glmTbpdfOiG+47X3UJ/qPfvvp0vYgob2aIunJ0nYmvKEqhooZ7wuxDcvOjAu0mzQlBQt4W1T5iKrL9r/U4fLPZ3IwGM/pZhB2vgwryRZmuNJnpwVsEBi/vPs9ATxsk3isSsAVt4oKrMHTeppz4Mrd8bCsEgPZ+ZLuKKZc39NC6vNR1d8ZAua8T2lfr6QWG3tZTj/ZeP+B9dUGLUA2QiWPAFJXPWalt50kA/NWrgqt8PYXwetbp4MHCcrKpi3Oi0GHmTLaFq8uRuljMsMRZ9iv+9ACQV8L0WVzXdHZoir HArTN8Qj zVbkakAMhMuyCq7x8mgjddQhVwLBMW76Qa1VM0uS0NWVLcbQ4r+1NGZRYjmjFZja4/UAItZ6jnU2dvr66e98P6GnEZSU1ifbJIJy9bvKuGkNvHfkJeOoQnlL6mwsJ0k7X4CN5+oJLylsQaFeYxtj87maPhYLHNzkOwFrxW/bMaBLzq5J1HSuAjr3/pfqrGJSp+Px/qFSWoSo9aP6uEMgb2tAipqADPoLfVRAkshiw+1Y4NwoU4XuHR9uBIlacs0oXmHxwqIEHjZ8g0llgtjNqlTvAFBLzUzUr/gg6BpXWKKia0m7gGlKqzUjmRQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 22/07/2024 10:35, Daniel Gomez wrote: > On Wed, Jul 17, 2024 at 11:45:48AM GMT, Ryan Roberts wrote: >> On 17/07/2024 11:31, David Hildenbrand wrote: >>> On 17.07.24 09:12, Ryan Roberts wrote: >>>> Hi All, >>>> >>>> This series is an RFC that adds sysfs and kernel cmdline controls to configure >>>> the set of allowed large folio sizes that can be used when allocating >>>> file-memory for the page cache. As part of the control mechanism, it provides >>>> for a special-case "preferred folio size for executable mappings" marker. >>>> >>>> I'm trying to solve 2 separate problems with this series: >>>> >>>> 1. Reduce pressure in iTLB and improve performance on arm64: This is a modified >>>> approach for the change at [1]. Instead of hardcoding the preferred executable >>>> folio size into the arch, user space can now select it. This decouples the arch >>>> code and also makes the mechanism more generic; it can be bypassed (the default) >>>> or any folio size can be set. For my use case, 64K is preferred, but I've also >>>> heard from Willy of a use case where putting all text into 2M PMD-sized folios >>>> is preferred. This approach avoids the need for synchonous MADV_COLLAPSE (and >>>> therefore faulting in all text ahead of time) to achieve that. >>>> >>>> 2. Reduce memory fragmentation in systems under high memory pressure (e.g. >>>> Android): The theory goes that if all folios are 64K, then failure to allocate a >>>> 64K folio should become unlikely. But if the page cache is allocating lots of >>>> different orders, with most allocations having an order below 64K (as is the >>>> case today) then ability to allocate 64K folios diminishes. By providing control >>>> over the allowed set of folio sizes, we can tune to avoid crucial 64K folio >>>> allocation failure. Additionally I've heard (second hand) of the need to disable >>>> large folios in the page cache entirely due to latency concerns in some >>>> settings. These controls allow all of this without kernel changes. >>>> >>>> The value of (1) is clear and the performance improvements are documented in >>>> patch 2. I don't yet have any data demonstrating the theory for (2) since I >>>> can't reproduce the setup that Barry had at [2]. But my view is that by adding >>>> these controls we will enable the community to explore further, in the same way >>>> that the anon mTHP controls helped harden the understanding for anonymous >>>> memory. >>>> >>>> --- >>> >>> How would this interact with other requirements we get from the filesystem (for >>> example, because of the device) [1]. >>> >>> Assuming a device has a filesystem has a min order of X, but we disable anything >>>> = X, how would we combine that configuration/information? >> >> Currently order-0 is implicitly the "always-on" fallback order. My thinking was >> that with [1], the specified min order just becomes that "always-on" fallback order. >> >> Today: >> >> orders = file_orders_always() | BIT(0); >> >> Tomorrow: >> >> orders = (file_orders_always() & ~(BIT(min_order) - 1)) | BIT(min_order); >> >> That does mean that in this case, a user-disabled order could still be used. So >> the controls are really hints rather than definitive commands. > > In the scenario where a min order is not enabled in hugepages-kB/ > file_enabled, will the user still be allowed to automatically mkfs/mount with > blocksize=min_order, and will sysfs reflect this? Or, since it's a hint, will it > remain hidden but still allow mkfs/mount to proceed? My proposal is that the controls are hints, and they would not block mounting a file system. As an example, the user may set `/sys/kernel/mm/transparent_hugepage/hugepages-16kB/file_enable` to `never`. In this case the kernel would never pick a 16K folio to back a file who's minimum folio size is not 16K. If the file's minimum folio size is 16K then it would still allocate that folio size in the fallback case, after trying any appropriate bigger folio sizes that are set to `always`. Thanks, Ryan > >> >> >>> >>> >>> [1] >>> https://lore.kernel.org/all/20240715094457.452836-2-kernel@pankajraghav.com/T/#u >>>