From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D3141C35FFA for ; Wed, 19 Mar 2025 20:38:13 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 4993828000A; Wed, 19 Mar 2025 16:38:12 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 44674280009; Wed, 19 Mar 2025 16:38:12 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2E7D628000A; Wed, 19 Mar 2025 16:38:12 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 0C0F1280009 for ; Wed, 19 Mar 2025 16:38:12 -0400 (EDT) Received: from smtpin25.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 8C3A6C0304 for ; Wed, 19 Mar 2025 20:38:12 +0000 (UTC) X-FDA: 83239462824.25.F827917 Received: from mail-pl1-f176.google.com (mail-pl1-f176.google.com [209.85.214.176]) by imf05.hostedemail.com (Postfix) with ESMTP id 8948D100008 for ; Wed, 19 Mar 2025 20:38:10 +0000 (UTC) Authentication-Results: imf05.hostedemail.com; dkim=pass header.d=fromorbit-com.20230601.gappssmtp.com header.s=20230601 header.b=1k74b0B2; spf=pass (imf05.hostedemail.com: domain of david@fromorbit.com designates 209.85.214.176 as permitted sender) smtp.mailfrom=david@fromorbit.com; dmarc=pass (policy=quarantine) header.from=fromorbit.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1742416690; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=LdtdvONsAHAOcLwD0oZAQdBuxPDOVfORd4lJuuN8zjI=; b=3WjlMwinn3t67ubNtkeUAZXnhtRI/iJLJlj0a9KsBimQXPF2+8Wy6MWrsVcGAZcFS2ZQD7 K2PbgDj2q4tV3+1+wKIp+ijpdm3jaAoBpMmVslct307ft4khh4gP0JHRX/+N+5FJ/s49yk bEirt2j/OJ8x4vWjV0qvebDJFJohDS8= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1742416690; a=rsa-sha256; cv=none; b=hxKG/58ZqxYojW5fsL0yuuo6MRg5VZpD6DRssXA4NQNtfcNwnWjeQJ2YgZR+G+XkjVOkv1 vDHyKqGe7lPQGzT94X4xUEMGJNmo0x3dlZVaeGF7IKUq2FoZqSEVzoS/0NLWNia1ZUwtmw dNvPddGrbKsJlyJlEbnYrbPP18W8cZs= ARC-Authentication-Results: i=1; imf05.hostedemail.com; dkim=pass header.d=fromorbit-com.20230601.gappssmtp.com header.s=20230601 header.b=1k74b0B2; spf=pass (imf05.hostedemail.com: domain of david@fromorbit.com designates 209.85.214.176 as permitted sender) smtp.mailfrom=david@fromorbit.com; dmarc=pass (policy=quarantine) header.from=fromorbit.com Received: by mail-pl1-f176.google.com with SMTP id d9443c01a7336-22401f4d35aso141452755ad.2 for ; Wed, 19 Mar 2025 13:38:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fromorbit-com.20230601.gappssmtp.com; s=20230601; t=1742416689; x=1743021489; darn=kvack.org; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date:from:to :cc:subject:date:message-id:reply-to; bh=LdtdvONsAHAOcLwD0oZAQdBuxPDOVfORd4lJuuN8zjI=; b=1k74b0B2TSewtnpjGnQUBJdB3i5hDwqN1ZlDiC+sRZXel3CEbJw+qbX2eXds8JQedU XQSCFnivxFmpsxxgv4hzrSdOr7M08lFiF9/xHtgdEO/dICwEjz/4H+T8ckbYcA4zM1St uZfurzlaLGsIKTg+XMQmzjEobrMC9buGeRBx6eMOm3OYU5+SkXWPTTapJhmwNXNrhKcA nf32IP/FVc/X+9udIcNG2ZSkMIrgO9m/eZWMLIz7SI5qqLsHTMv4bJdkeHUCJ7YLlsV7 bB2D/m440NrF5Tt7/6vL/RF77KOzb2zyc3Hi3fHEEGnZDHqPkXJep5UTqsl28/z0xfod PXbg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1742416689; x=1743021489; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=LdtdvONsAHAOcLwD0oZAQdBuxPDOVfORd4lJuuN8zjI=; b=Ti/FHRGteRnL3sMlvVqZbtAalaMv28y966ayQHOUK7iKYKkg9b4N/DHCokyo9UPXV+ dGMr2bW+TJ2DooQ9pdM/iQwfqg+yhwTVlbTIp2e/k6cQuAla9Nx88U7eiqzxdIFAQBl0 6kLCCI7wq6P161zXKwWTmXij2W+13yKXbiZbKuNp4E2ie/7VQwOgmdlTzE6YDdrLsSZm voyQQmDgv1b4h/lU8ptPROzWoMpe5gKrRkxB4K2iGIJReu7TwYvTLSqe7VrYQcafwUK2 5w4SqgpLODoVnmSIBfEBe5gq1cvBRZbmYezvT2D9l5rAaQldNb3wE2Z6gtl6jPcTL7NX POQA== X-Forwarded-Encrypted: i=1; AJvYcCVBKSoqLcwIg9LGXdr6JnI8moS89Wt82wKaJNvCmfPaedAZnkoZrEqiW5K7onSUxDzpDlk1CMwhmg==@kvack.org X-Gm-Message-State: AOJu0YxUrsGdw9BkpWfGUtRJ818PR8KULQ/9+HFioy00fR+Pt5g1FNTl rDkZqHYgpeaPpyHrtI1mZk1sEJabHF6Qh8qhgCozy0Or0Fs3aPn8fqcFsWyVl60= X-Gm-Gg: ASbGncu33+8K5P14XcrA/xPJRxJLUVxYH213PsqgwAnQEu2COOSok3Onqos2pU3Flg8 uB90QvfFozwGvKhjIMenY0bgDnmMltfPCDdC6iaglvcnNHcc9c7/xyWZCAkHwGhyLcVlveMT7VT zkGhFCopYXnMderwe8J43fwTJXb0dv6JOCuhCCKs6bLXNlOHFJldPM/NhaxlF4/6CX5IIQstgRy 0SUaj9v8HMDUnrnZ6XO0ZovFFw3cZJtfe/HkRRkgVvtEM5cdJHkGiZELBH2YkgS/DSf/8GzBaDf YpwAk+iuScO+EOH3mJOOo64Cjez3nBYOzBwSAhqAGRwjnEYaWCVz96Bziel4gyq0xfuz06zbLd1 ls/tt+ijOP63AW2gemxCr X-Google-Smtp-Source: AGHT+IFRfbjTEcAmUQSo8Y5SUigVC0Qg62+sMp3RnTwY444JPPhmmNrRGeLXxeal2wSy7+4EX+D2Uw== X-Received: by 2002:a17:902:d4c7:b0:21f:522b:690f with SMTP id d9443c01a7336-22649a57d5emr61301305ad.46.1742416689283; Wed, 19 Mar 2025 13:38:09 -0700 (PDT) Received: from dread.disaster.area (pa49-186-36-239.pa.vic.optusnet.com.au. [49.186.36.239]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-225c6bd7bb4sm119187415ad.237.2025.03.19.13.38.07 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 19 Mar 2025 13:38:08 -0700 (PDT) Received: from dave by dread.disaster.area with local (Exim 4.98) (envelope-from ) id 1tv0Av-0000000FHga-1SIl; Thu, 20 Mar 2025 07:38:05 +1100 Date: Thu, 20 Mar 2025 07:38:05 +1100 From: Dave Chinner To: Yang Shi Cc: Ryan Roberts , lsf-pc@lists.linux-foundation.org, Linux-MM , Matthew Wilcox , Barry Song <21cnbao@gmail.com> Subject: Re: [LSF/MM/BPF TOPIC] Mapping text with large folios Message-ID: References: <6201267f-6d3a-4942-9a61-371bd41d633d@arm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: X-Rspam-User: X-Rspamd-Server: rspam11 X-Rspamd-Queue-Id: 8948D100008 X-Stat-Signature: 5hoqibpwpfskcyrqohjwdypszu9mgtjn X-HE-Tag: 1742416690-390396 X-HE-Meta: U2FsdGVkX1+fFI2dzzvunx6tEIrhc7x1S9rsaff0gruNO/evjfYJGd5S16E1dO9UjycOhg8r5TGCYNlCrsz5GSyKNw9r2lTDYcjcVG4NbAJqPJ0B+Vv4oZInzhTX0GoHT/WkiCmnJM6l9d86j1nfkWQRwiLH+7ZiyMIVz9+4O9MNlyDz/HAI9FpxrGJHhwn1hijQ4Z6SJ/DKHx2Q6cKqJd4GoYlN/rOLKL0K3HK2M6z0qB9Ftb6JZj26TGxzABJf/NsLyn8Ub/+AJIIitAVWGxLwDMCL6UpOYDqjVgNfVJs430WPqHrnXdCLnS3+CYImltzubQapw7HnZLtvfgvdcJKdRuMS3Hytjj3Hkfk8SqxzdSYz8REhw/O5UbCOs3CGM4q9zdISgYXkT7/dLTvfBffQt1DRjwoOBkgc7nY8vIM0ZDA/18kt6LoZO0goURSzAQu/7PQi2iCkCLMTKzHshcaFVkTAqmks/5Cxvz6PEX3MS+CupE6J5TpHi3it476Un3VIl4Q/86GGvMCTjvg2i9xRXustRa9mIE7389Wtj7wn86SqN/F+2KL+RsXEKzn96D8iKIPd5j6IssFQwHqnqYzwL62iNo6VDu+I9KFli9rKqpjzpZgBa20Pjb4QdsRESty1nX2e5wkrhHjRFDLrkFJvwZJ7cmLfEOF889cC4Me9KWFpOuIjcGorQlp13Z5ijaXXHuDit2oRNBARDscKtR+GmU6+o7tit2qnBbwoA8/+g38ih1ATctYhGS/sA2ew5L1W3Vq2hhU7dwesCsCaOrqlmwep0WyJrqFRQ4zbfjbmSWT5rxvTtXyO2SA1GkyH684MkhxplOBzobgQJmCHhUwXL+5gBvIbvcMqpDhOeGGkgVsL6L1qmqIgQ0PYKLeBa+UDZQJdnxfJaArRSyA8nfAt0BqD64gVOU/VoK/EdnF+VNMxOR1XJ2Xu/kMm9/EBeds5RdC+0yBlGlhvOc7 IUAAaFjC nSFsP0Cg7LZNqMhgw5Qi/QDWjjc9LAhY2p+8uCSM/sTH1jIvAPEDA/eI8HY1htovaRvXKXtblmA7YoXxxaHi7eKYVfzS5IuUUef+rZ9o7o4kg+23oCvZN6UwLeHLn98PiKk4n4B0tqDKRrmtPVsYRJbPXA9i9SrRkj2GpqW2mxAtpAKQm8Mi0fr0/YjhHmz9rngXfpne/vF1iK+o7B/5HEZjvtq10KDVKO6HHfAsRGo3QlMvtxr/LKGDW5QDHUUyClzJvRU1hegyzxiIXTiDFdQbZ+waj++zhIPUlL9Z19MEeTOsl5dSWPAHBnKAuo4mAMpo5tmBTeiGApqcFPZ11/zkkE4td560eYlVhSLV6f9efrSX19bkguhimXBBym1dsKD7fpWF4+4rISzmFpBvqOpchfDqtJRJDJxG22Q7oEjU2SIMOA4Siii1FKd/eF91L5HZkdipDmFG4UkG0UD2JHp5aXuICkSuz2/mnjgkYha8cqFZVX7QyVR8NlA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, Mar 19, 2025 at 11:16:16AM -0700, Yang Shi wrote: > On Wed, Mar 19, 2025 at 8:39 AM Ryan Roberts wrote: > > > > Hi All, > > > > I know this is very last minute, but I was hoping that it might be possible to > > squeeze in a session to discuss the following? I'm not going to be at LSFMM, so I'd prefer this sort of thing get discussed on the dev lists... > > Summary/Background: > > > > On arm64, physically contiguous and naturally aligned regions can take advantage > > of contpte mappings (e.g. 64 KB) to reduce iTLB pressure. However, for file > > regions containing text, current readahead behaviour often yields small, > > misaligned folios, preventing this optimization. This proposal introduces a > > special-case path for executable mappings, performing synchronous reads of an > > architecture-chosen size into large folios (64 KB on arm64). Early performance > > tests on real-world workloads (e.g. nginx, redis, kernel compilation) show ~2-9% > > gains. > > AFAIK, MySQL is quite sensitive to iTLB pressure. It should be worth > adding to the tests. > > > > > I’ve previously posted attempts to enable this performance improvement ([1], > > [2]), but there were objections and conversation fizzled out. Now that I have > > more compelling performance data, I’m hoping there is now stronger > > justification, and we can find a path forwards. > > > > What I’d Like to Cover: > > > > - Describe how text memory should ideally be mapped and why it benefits > > performance. I think the main people involved already understand this... > > - Brief review of performance data. You don't need to convince me - there's 3 decades of evidence proving that larger, fewer page table mappings for executables results in better performance. > > - Discuss options for the best way to encourage text into large folios: > > - Let the architecture request a preferred size > > - Extend VMA attributes to include preferred THP size hint > > - Provide a sysfs knob > > - Plug into the “mapping min folio order” infrastructure > > - Other approaches? Implement generic large folio/sequential PTE mapping optimisations for each platform, then control it by letting the filesystem decide what the desired mapping order and alignment should be for any given inode mapping tree. > Did you try LBS? You can have 64K block size with LBS, it should > create large folios for page cache so text should get large folios > automatically (IIRC arm64 linker script has 64K alignment by default). We really don't want people using 64kB block size filesystems for root filesystems - there are plenty of downsides to using huge block sizes for filesytems that generally hold many tiny files. However, I agree with the general principle that the fs should be directing the inode mapping tree folio order behaviour. i.e. the filesystem already sets both the floor and the desired behaviour for folio instantiation for any given inode mapping tree. It also needs to be able to instantiate large folios -before- the executable is mapped into VMAs via mmap() because files can be read into cache before they are run (e.g. boot time readahead hacks). i.e. a mmap() time directive is too late to apply to the inode mapping tree to guarantee optimal layout for PTE optimisation. It also may not be possible to apply mmap() time directives due to other filesystem constraints, so mmap() time directives may well end up being unpredictable and unreliable.... There's also an obvious filesystem level trigger for enabling this behaviour in a generic manner. e.g. The filesystem can look at the X perm bits on the inode at instantiation time and if they are set, set a "desired order" value+flag on the mapping at inode cache instantiation in addition to "min order". If a desired order is configured, the page cache read code can then pass a FGP_TRY_ORDER flag with the fgp_order set to the desired value to folio allocation. If that can't be allocated then it can fall back to single page folios instead of failing. At this point, we will always optimistically try to allocate larger folios for executables on all architectures. Architectures that can optimise sequential PTE mappings can then simply add generic support for large folio optimisation, and more efficient executable mappings simply fall out of the generic support for efficient mapping of large folios and filesystems preferring large folios for executable inode mappings.... -Dave. -- Dave Chinner david@fromorbit.com