From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 674D1C3600C for ; Fri, 28 Mar 2025 13:09:57 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id DA2CD280140; Fri, 28 Mar 2025 09:09:55 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id D51D328013E; Fri, 28 Mar 2025 09:09:55 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C418D280140; Fri, 28 Mar 2025 09:09:55 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id A6AC428013E for ; Fri, 28 Mar 2025 09:09:55 -0400 (EDT) Received: from smtpin13.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 6DBA8B15E6 for ; Fri, 28 Mar 2025 13:09:56 +0000 (UTC) X-FDA: 83270992392.13.A795F61 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by imf06.hostedemail.com (Postfix) with ESMTP id 54EF818000C for ; Fri, 28 Mar 2025 13:09:54 +0000 (UTC) Authentication-Results: imf06.hostedemail.com; dkim=none; spf=pass (imf06.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com; dmarc=pass (policy=none) header.from=arm.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1743167394; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=uUnPgsuz8T+vePvFGDWjwwoff9wHljDhHBQ659Low9s=; b=tFwwcgMDsrWe2lmBps1U8/PfFxwugdXilF2Cnuk/rKgLbZkBitVj+ueMwa8ZCeFvTCJCr3 W1IGCJb0U69v/oR9hE9T1KhqrCJgWqdt7IjsEwRUOQt5Z2IsmXZxmFvPL7X+kO9JIiOJcx 5CsHuGyw4+NtkbguzZ0RMEb1bhVtI1c= ARC-Authentication-Results: i=1; imf06.hostedemail.com; dkim=none; spf=pass (imf06.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com; dmarc=pass (policy=none) header.from=arm.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1743167394; a=rsa-sha256; cv=none; b=ZMqGFLNhXgGxZ+ecWHcpHVAvd+ED2lQm6tlZClkhIvMWVM4VA9dXuKCF4ImSMymn3WWhTr 9jqiQYSu5zLYNK5n+FwQ8ZdncwU/goj5T+Qe68Oe9gbtecD0oAFPFpneRP9Bo+i6BTHeTx ycJa0ii9fPSaDfE/qY8RJ2TEXQ7DKh4= Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 0F6411007; Fri, 28 Mar 2025 06:09:58 -0700 (PDT) Received: from [10.57.87.112] (unknown [10.57.87.112]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 45A2B3F63F; Fri, 28 Mar 2025 06:09:51 -0700 (PDT) Message-ID: Date: Fri, 28 Mar 2025 13:09:49 +0000 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v3] mm/filemap: Allow arch to request folio size for exec memory Content-Language: en-GB To: Zi Yan , Matthew Wilcox Cc: Andrew Morton , David Hildenbrand , Dave Chinner , Catalin Marinas , Will Deacon , linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org References: <20250327160700.1147155-1-ryan.roberts@arm.com> <731D8D6E-52A0-4144-A2BB-7243BFACC92D@nvidia.com> From: Ryan Roberts In-Reply-To: <731D8D6E-52A0-4144-A2BB-7243BFACC92D@nvidia.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Rspamd-Queue-Id: 54EF818000C X-Stat-Signature: fgtx64z9bftq6a5ce4xp6k13iq1n4u8f X-Rspam-User: X-Rspamd-Server: rspam12 X-HE-Tag: 1743167394-201490 X-HE-Meta: U2FsdGVkX180q9Ih78aUeCBCqFe8YT4w/KN5kLTCXXrLNBv6lnL8VxV++glfmqLtcLqE2JZCBAkn1KD1ym5pgtNy0MlVukhLF7GYOiITXrZcORB/eJS6fglUGJHFSE/3klSCRNQgiD28rEOQZgJfiZp3V5a3or7zjIR18N2LefBpeQSinY0ghtmhlbRQX3fMAaUHM2amJJlwgcdsWVF7KK9XMPsCUmJ2HIhUB5qztUREAHeqnLC3wAhyXsnCli/f8o7S0JvQmfJO16KQdf//Ib9Ddee/5O0apTkJ+nj1OBGOBsirE6k0io4WNWlYTf3ZkczYVUty1s327ghH70q7Dno8sH8qtlXl5Ia5b3axsb+0+zCXIK0GuiOg7qWWpC6xj90WYG4IsDy6enzXGkaFFMM/+UXFCa6rMVJPpSNSj5X2xzVXe+M2WwnCwIeAVQ6hnLGhVgX2baGhN1cd+ESXdYGzM0JfU8IhBltFrMJn0dDrb2AJjIVW1Fqv82GOQch3qdIvmF1qlWfd1EhXCWlSLsuLgh5a6BsCP2OPgZ1sAL91MdcZK/tyuc4CxTt/SptOCrnqGpV5VvBENexLFo8Ynz4HvzLveGMuaFv/8nS4Z8OCKYfG9vIlSGRc45UQOLTVq62zFMwgw8tG2GxAdMjqx0OHuAkxcx2v+hqDOEsZn8whHTbWW9lLI+Uh+Ge7+6LNhLL5+lGa7luKm3oAEE6H5E8zgTqmx9P9Cw96At6HWBxoU0md1xRWKgi8SBh7JvlfeRbLnPk7Mo0NWEKYKADArWOB06x8y4qpFPHwLJ4GNkATmXKyrGcM6iskihlm9y0/2CwBPxdOZf9xP/YwmFCiOQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 27/03/2025 20:07, Zi Yan wrote: > On 27 Mar 2025, at 12:44, Matthew Wilcox wrote: > >> On Thu, Mar 27, 2025 at 04:06:58PM +0000, Ryan Roberts wrote: >>> So let's special-case the read(ahead) logic for executable mappings. The >>> trade-off is performance improvement (due to more efficient storage of >>> the translations in iTLB) vs potential read amplification (due to >>> reading too much data around the fault which won't be used), and the >>> latter is independent of base page size. I've chosen 64K folio size for >>> arm64 which benefits both the 4K and 16K base page size configs and >>> shouldn't lead to any read amplification in practice since the old >>> read-around path was (usually) reading blocks of 128K. I don't >>> anticipate any write amplification because text is always RO. >> >> Is there not also the potential for wasted memory due to ELF alignment? >> Kalesh talked about it in the MM BOF at the same time that Ted and I >> were discussing it in the FS BOF. Some coordination required (like >> maybe Kalesh could have mentioned it to me rathere than assuming I'd be >> there?) >> >>> +#define arch_exec_folio_order() ilog2(SZ_64K >> PAGE_SHIFT) >> >> I don't think the "arch" really adds much value here. >> >> #define exec_folio_order() get_order(SZ_64K) > > How about AMD’s PTE coalescing, which does PTE compression at > 16KB or 32KB level? It covers 4 16KB and 2 32KB, at least it will > not hurt AMD PTE coalescing. Starting with 64KB across all arch > might be simpler to see the performance impact. Just a comment, > no objection. :) exec_folio_order() is defined per-architecture and SZ_64K is the arm64 preferred size. At the moment x86 is not opted in, but they could choose to opt in with 32K (or whatever else makese sense) if the HW supports coalescing. I'm not sure if you thought this was global and are arguing against that, or if you are arguing for it to be global because it will more easily show us performance regressions earlier if x86 is doing this too? > > Best Regards, > Yan, Zi