From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8A800EB64DC for ; Mon, 17 Jul 2023 13:20:39 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 25F816B0072; Mon, 17 Jul 2023 09:20:39 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 20EB56B0074; Mon, 17 Jul 2023 09:20:39 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0D6B28D0001; Mon, 17 Jul 2023 09:20:39 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id F3B576B0072 for ; Mon, 17 Jul 2023 09:20:38 -0400 (EDT) Received: from smtpin03.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id C50C3A02AF for ; Mon, 17 Jul 2023 13:20:38 +0000 (UTC) X-FDA: 81021163356.03.F172AC7 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by imf24.hostedemail.com (Postfix) with ESMTP id BD623180026 for ; Mon, 17 Jul 2023 13:20:36 +0000 (UTC) Authentication-Results: imf24.hostedemail.com; dkim=none; spf=pass (imf24.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com; dmarc=pass (policy=none) header.from=arm.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1689600037; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=wVl+SnEq1dRMDhl+TSTNaTHmTkNUdQiSLaiPdr8OJ6c=; b=ZAWyyOxsRKIU8aSZEKd6OyUf7eLXh2SUc13Vs6tyvPHw46BAMWIU8HmsyIw9pYIY4ouOFR M2/L9+K6mk82hVLhlCAioYolUXIGjtVFi6CQSO2lAESYA8WJtI0dl58ClRaXh2jgZbFRU+ U4ty35Kt+e3kRg3nQAlB8CyWCi/SAIE= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1689600037; a=rsa-sha256; cv=none; b=Tg9RdMRYraRk2C27025N/vJdUrL5pGOfIC4BXhmE3QPjmyNlu84O+glCqjBCj5CMkUwKBp SCPLCTCG6/hN7oW0b5tOJrLp1pnXBXcKB4niVpko5rXQ8+4CrQLi1/QUp3fWAk9/NK5xiu 4cL90varzfhA6mBlasgbQvd8knluno4= ARC-Authentication-Results: i=1; imf24.hostedemail.com; dkim=none; spf=pass (imf24.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com; dmarc=pass (policy=none) header.from=arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 36EE3C15; Mon, 17 Jul 2023 06:21:19 -0700 (PDT) Received: from [10.57.76.30] (unknown [10.57.76.30]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 050F73F67D; Mon, 17 Jul 2023 06:20:32 -0700 (PDT) Message-ID: <2c4b2a41-1c98-0782-ac30-80e65bdb2b0c@arm.com> Date: Mon, 17 Jul 2023 14:20:31 +0100 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:102.0) Gecko/20100101 Thunderbird/102.13.0 Subject: Re: [PATCH v3 3/4] mm: FLEXIBLE_THP for improved performance To: David Hildenbrand , Yu Zhao Cc: Andrew Morton , Matthew Wilcox , "Kirill A. Shutemov" , Yin Fengwei , Catalin Marinas , Will Deacon , Anshuman Khandual , Yang Shi , "Huang, Ying" , Zi Yan , Luis Chamberlain , linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org References: <20230714160407.4142030-1-ryan.roberts@arm.com> <20230714161733.4144503-3-ryan.roberts@arm.com> <82c934af-a777-3437-8d87-ff453ad94bfd@redhat.com> From: Ryan Roberts In-Reply-To: <82c934af-a777-3437-8d87-ff453ad94bfd@redhat.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Rspamd-Queue-Id: BD623180026 X-Rspam-User: X-Rspamd-Server: rspam11 X-Stat-Signature: wpo5tiqpskrfpmacoryockrybhy4ejnk X-HE-Tag: 1689600036-309145 X-HE-Meta: U2FsdGVkX19fF/RpN43YLCC+xBSwd20cfzfYX8df6QM2xrLCFKVAnDBKLdR+JnimvMy7oiyvkwOdEPZhH76CYR7dDSW5wLN1J7mlKJqWr0hGKiGOPMylMgITfooYFBhwWapi4tf9mgDzHI3543VCoTbCUFSA0s8B+4kiqyGt7kleHOoOLq0JiRV8PWDzDQWXsTeeSDJ4O9lYuds8s7XscpnWUpn7DtcYe/mE5kAlmg31JZ1n50n6gcPPUBAnkKyzU724HlDgjIHVZ6inyys112aqm9bw0b/IFirS9HWLlt3q/kPGE3TakIrSZAMGhMSxDyTCOI3AzHRUcavawrTOLtjzxBF8H0j3+u/hasMxpJHeWl/dNIkqhewRTmEXPLDpZ50T9EWxFFGNykm5hG1VhXM6ntxCu+nNPDXNztmDoOr6bhDUf/DKB76T2dec7mULzgJx2OyxkI7wv5U5N2D0xqhvafRvPVN7tW9wGc7IkhewuaL/gL+7Fei474+Xn2/B5rAMyTRRenMQuvpMoIqmbE3htF/YLE69Mwz6v5WO5M9GvOPP6vbz2sePeOb2GRZvRWKuj+lhTx2T+kW7nXUqK1Xb5IJPdH1IGOs3GElNblDyNkrNz5V3+ZLvq3w8nxvIK3W0JkQFcLPF2CEkCRGUHr6bggcI9k0PWdqToZv/ean639Vy37rjbX2eZjnnOiHtm7CPhTShK+1+2NF3SWqfwQbzupVHsHgOe9R33ERc4jaOcO2eNhvaPznTeZWlIOGP7wOTvRQ6fIu/9bUvuH7oBH/TK0B4N013/AlPHv3XBmVT1oGyTaIk1PWLTD6BlIG35T0vZ+4VoYj1u3xfGksDgUwZq2KKjXCBFx+WKw/WStjUdE3rj0LAYoRJRzuy3yy37nKOuc/VBCMKLNNKUSSjv/jJmJTOrfjx3ErorvyhkEwJEPbifRVwUE7CIAFbRWDOjezZ+4KCyaSI44SXH9M 35cH6yJF 0LcSzyvI2FanSm93bGz3pZUTeGveGbq4BSGP9qt3xOnR+NDZFrKD1vpjex1Pr/Gtt5kvD223C31ZUlY9GxLghvv8z7GMQOsm6Sv7A9hF/tqS4aX1VmrS4FU+3vAX7HKyCb3NC5EUVzXunKph+CooacD8qKPiS//V3gezGCwGmQXHLEpfHrXGa4ll2V8QlX4RrjOYBvM5Y6hepmdZXI/hM0X78ug== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 17/07/2023 14:06, David Hildenbrand wrote: > On 14.07.23 19:17, Yu Zhao wrote: >> On Fri, Jul 14, 2023 at 10:17 AM Ryan Roberts wrote: >>> >>> Introduce FLEXIBLE_THP feature, which allows anonymous memory to be >>> allocated in large folios of a determined order. All pages of the large >>> folio are pte-mapped during the same page fault, significantly reducing >>> the number of page faults. The number of per-page operations (e.g. ref >>> counting, rmap management lru list management) are also significantly >>> reduced since those ops now become per-folio. >>> >>> The new behaviour is hidden behind the new FLEXIBLE_THP Kconfig, which >>> defaults to disabled for now; The long term aim is for this to defaut to >>> enabled, but there are some risks around internal fragmentation that >>> need to be better understood first. >>> >>> When enabled, the folio order is determined as such: For a vma, process >>> or system that has explicitly disabled THP, we continue to allocate >>> order-0. THP is most likely disabled to avoid any possible internal >>> fragmentation so we honour that request. >>> >>> Otherwise, the return value of arch_wants_pte_order() is used. For vmas >>> that have not explicitly opted-in to use transparent hugepages (e.g. >>> where thp=madvise and the vma does not have MADV_HUGEPAGE), then >>> arch_wants_pte_order() is limited by the new cmdline parameter, >>> `flexthp_unhinted_max`. This allows for a performance boost without >>> requiring any explicit opt-in from the workload while allowing the >>> sysadmin to tune between performance and internal fragmentation. >>> >>> arch_wants_pte_order() can be overridden by the architecture if desired. >>> Some architectures (e.g. arm64) can coalsece TLB entries if a contiguous >>> set of ptes map physically contigious, naturally aligned memory, so this >>> mechanism allows the architecture to optimize as required. >>> >>> If the preferred order can't be used (e.g. because the folio would >>> breach the bounds of the vma, or because ptes in the region are already >>> mapped) then we fall back to a suitable lower order; first >>> PAGE_ALLOC_COSTLY_ORDER, then order-0. >>> >>> Signed-off-by: Ryan Roberts >>> --- >>>   .../admin-guide/kernel-parameters.txt         |  10 + >>>   mm/Kconfig                                    |  10 + >>>   mm/memory.c                                   | 187 ++++++++++++++++-- >>>   3 files changed, 190 insertions(+), 17 deletions(-) >>> >>> diff --git a/Documentation/admin-guide/kernel-parameters.txt >>> b/Documentation/admin-guide/kernel-parameters.txt >>> index a1457995fd41..405d624e2191 100644 >>> --- a/Documentation/admin-guide/kernel-parameters.txt >>> +++ b/Documentation/admin-guide/kernel-parameters.txt >>> @@ -1497,6 +1497,16 @@ >>>                          See Documentation/admin-guide/sysctl/net.rst for >>>                          fb_tunnels_only_for_init_ns >>> >>> +       flexthp_unhinted_max= >>> +                       [KNL] Requires CONFIG_FLEXIBLE_THP enabled. The maximum >>> +                       folio size that will be allocated for an anonymous vma >>> +                       that has neither explicitly opted in nor out of using >>> +                       transparent hugepages. The size must be a power-of-2 in >>> +                       the range [PAGE_SIZE, PMD_SIZE). A larger size improves >>> +                       performance by reducing page faults, while a smaller >>> +                       size reduces internal fragmentation. Default: max(64K, >>> +                       PAGE_SIZE). Format: size[KMG]. >>> + >> >> Let's split this parameter into a separate patch. >> > > Just a general comment after stumbling over patch #2, let's not start splitting > patches into things that don't make any sense on their own; that just makes > review a lot harder. ACK > > For this case here, I'd suggest first adding the general infrastructure and then > adding tunables we want to have on top. OK, so 1 patch for the main infrastructure, then a patch to disable for MADV_NOHUGEPAGE and friends, then a further patch to set flexthp_unhinted_max via a sysctl? > > I agree that toggling that at runtime (for example via sysfs as raised by me > previously) would be nicer. OK, I clearly misunderstood, I thought you were requesting a boot parameter. What's the ABI compat guarrantee for sysctls? I assumed that for a boot parameter it would be easier to remove in future if we wanted, but for sysctl, its there forever? Also, how do you feel about the naming and behavior of the parameter? >