From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id DDC06C3DA59 for ; Mon, 22 Jul 2024 14:10:33 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 607C36B0085; Mon, 22 Jul 2024 10:10:33 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 5B77E6B0088; Mon, 22 Jul 2024 10:10:33 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4804F6B0089; Mon, 22 Jul 2024 10:10:33 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 29B056B0085 for ; Mon, 22 Jul 2024 10:10:33 -0400 (EDT) Received: from smtpin09.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id CBFBE1A1751 for ; Mon, 22 Jul 2024 14:10:32 +0000 (UTC) X-FDA: 82367573904.09.E265807 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by imf01.hostedemail.com (Postfix) with ESMTP id 0A16B4003A for ; Mon, 22 Jul 2024 14:10:29 +0000 (UTC) Authentication-Results: imf01.hostedemail.com; dkim=none; spf=pass (imf01.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com; dmarc=pass (policy=none) header.from=arm.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1721657408; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=c8t1wWf409v5YEefhcWTyqwtW1YlZrtO0cqX2FVqdEo=; b=DakMKnT+hP5Zm+Mp5pQkcrgOaLFTxxE0kIjw2QpyB13SoTAORN7ykLAOs7uMTdwr2xJIxZ bkLbmmdMj5GSV6kwZMOP9YUkmb+2bNnQjfJYUMbOBgpkoOWm+65JUeRDGMR4s7x4MUs/81 nBonqEXINuqlbrIauYDnAQU/1c3q8KU= ARC-Authentication-Results: i=1; imf01.hostedemail.com; dkim=none; spf=pass (imf01.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com; dmarc=pass (policy=none) header.from=arm.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1721657408; a=rsa-sha256; cv=none; b=52gYUnGTTbV/e71nuktu0KaGSbvXh3TDAgZJgrLxTGjwfiWVZT2gN5U/tt1LBqpPllb0VL TCp0RgzfqQ5WKntyAyW5NuZQfZc5YoJsL2i8SG+PoKWVPus9pez5aLHj0uarjujjC+4YRZ hZ4c/jhJdnINvXWYmO9zlMt3ocdZ0E0= Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 75867FEC; Mon, 22 Jul 2024 07:10:54 -0700 (PDT) Received: from [10.1.27.165] (XHFQ2J9959.cambridge.arm.com [10.1.27.165]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 7790A3F73F; Mon, 22 Jul 2024 07:10:27 -0700 (PDT) Message-ID: <3a499df9-f1ef-4552-b460-8585bf8bca92@arm.com> Date: Mon, 22 Jul 2024 15:10:26 +0100 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [RFC PATCH v1 3/4] mm: Override mTHP "enabled" defaults at kernel cmdline Content-Language: en-GB From: Ryan Roberts To: Daniel Gomez Cc: Andrew Morton , Hugh Dickins , Jonathan Corbet , "Matthew Wilcox (Oracle)" , David Hildenbrand , Barry Song , Lance Yang , Baolin Wang , Gavin Shan , Pankaj Raghav , "linux-kernel@vger.kernel.org" , "linux-mm@kvack.org" References: <20240717071257.4141363-1-ryan.roberts@arm.com> <20240717071257.4141363-4-ryan.roberts@arm.com> In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Rspamd-Server: rspam03 X-Rspam-User: X-Rspamd-Queue-Id: 0A16B4003A X-Stat-Signature: aitt3pdqxth8k8yznzay68emybndprtk X-HE-Tag: 1721657429-646701 X-HE-Meta: U2FsdGVkX19+Scw9rWjnrGh9qqJLPzQr+lnlkW07VATPZZhjqqYdmX6GIXJYYw1byKDzPzYUJeDZaNFV55vM1+sAw1H2abRH5GpRbZ6cgvxyrLMIsqcspIf+IlNifVlo3j2L/aJXj7NAsYDCOwIhmr4fbft1RxLwP7Tz1/yb4mz/NAifAlxOYLsbtcoGg0/ill0Xd95llElgtdssQbToXJNvJcGlbUoXyBQO0+cfxhtYMOnCgeWLaJSjRIeZNrLOdIGKFn+nRz/9rDCfHGzRtaS060ZC4s/aAyelFYIV0PEVUtEHtSTkXFMOKntJB8uHcikytPxWsP7J4rOEXcEt51yRF1XFap4WoDheDbZeVuoPOOoiZ1sLnj17ea3TiqlcYffiQaIBPRVeFvujwsd/ReEhcrQZvX7owG+/VbwZqAlg3ystTD7z1uAKfDjDcOieSIBW8cbA3+xBXQRCpRvRoKcBqSaEZP3c7cwrYgKmpc9NFLfXR9lAmGjJ0ycGvQUpc2i8BAUgdP8iL3yzPI0tLBkjKSLfVYBC5Bv+Il4T1XmEWSTw4TxOrS8ZFS1njkHAJBQX2a4/vuSosSt7RPMv9S0LsFPZpAMQ5hZmLMZL2lBRGaLIvdabDuBdJ9WRphdfL9DMi2/dbPCMpnO5Md+Pq+WrftvLbyFXaopyl2FhDg53tcf8szIsq8FoFHL9Af6uhrXkozgwZrZhncfA3AKVZi5VoUabmPdv4APyOoodTB4QiBTEZG8h9D0icEhkyd7LlYP1kD1xXYufYO1pC899x6YAIJk3F6u/zCQWDq0ZMCv6E1togmyMuNLDIrzazcF3+JY3YnG6G3KrBS++064eNEgUpJrHat7VxCBgE7SBnM8F2Y65lXTYsTXADFOXt0QJwpN43oEVvYil0u4faP9tojPV2mv7P6exZlVNi/qK0o+siPcrIDdeujBtVfCaaWjRKv77lxzANd6culGELB9 7znWvqOX c6p2UYxb1xMMIK9Ef6oXvOXRkvy8G685UZ8/wHjfeFayIEQFbxIXvLagg1Gax178XSn6h553bPArPBJoOBvpWQtEQPwik3EGiycKk/YccyMFtrcFwROGcR8eUFLLlARpQnQX2n/YcBZAPuuSRzp9Ca2Dbed1PagrtPHl+COJCd6QdeWgxJcSGVNEnUJcbTuC5igtH6Vlz+sBakH7vpN+Y39x/okiErmyEhkUZgfrwsLYplqxWEC8uUApZwHHlJC8hQgCoUt1tU54P/2LzGU8vzvMZQLtDFJRB721Nat5foCQl7z1PN0q1YtnL3mTaFE4zAxme92qy5Xzlcss= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 22/07/2024 10:36, Ryan Roberts wrote: > On 22/07/2024 10:13, Daniel Gomez wrote: >> On Wed, Jul 17, 2024 at 08:12:55AM GMT, Ryan Roberts wrote: >>> Add thp_anon= cmdline parameter to allow specifying the default >>> enablement of each supported anon THP size. The parameter accepts the >>> following format and can be provided multiple times to configure each >>> size: >>> >>> thp_anon=[KMG]: >> >> Minor suggestion. Should this be renamed to hp_anon= or hugepages_anon= instead? >> This would align with the values under /sys/kernel/mm/transparent_hugepage/ >> hugepages-*kB. > > "hp" doesn't feel right; that's not an abreviation we use today to my knowledge. > But I'd be happy to change it to "hugepages_anon", if that's the concensus. Thinking about this a bit more, "hugepages=" is already a cmdline parameter used to reserve hugepages for use with HugeTLB. So I think that could get confusing. transparent_hugepage= is the existing cmdline parameter for the top-level (anon) control. I considered "transparent_hugepage_anon=" or even just extending to use the same parameter for both the top level and the per-size controls (with optional size): transparent_hugepage=[[KMG]:] But given they likely need to be provided multiple times, both of those options seem too long. Which is how I settled on thp_anon= (and in the next patch, thp_file=). > >> >>> >>> See Documentation/admin-guide/mm/transhuge.rst for more details. >>> >>> Configuring the defaults at boot time is useful to allow early user >>> space to take advantage of mTHP before its been configured through >>> sysfs. >>> >>> Signed-off-by: Ryan Roberts >>> --- >>> .../admin-guide/kernel-parameters.txt | 8 +++ >>> Documentation/admin-guide/mm/transhuge.rst | 26 +++++++-- >>> mm/huge_memory.c | 55 ++++++++++++++++++- >>> 3 files changed, 82 insertions(+), 7 deletions(-) >>> >>> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt >>> index bc55fb55cd26..48443ad12e3f 100644 >>> --- a/Documentation/admin-guide/kernel-parameters.txt >>> +++ b/Documentation/admin-guide/kernel-parameters.txt >>> @@ -6592,6 +6592,14 @@ >>> : poll all this frequency >>> 0: no polling (default) >>> >>> + thp_anon= [KNL] >>> + Format: [KMG]:always|madvise|never|inherit >>> + Can be used to control the default behavior of the >>> + system with respect to anonymous transparent hugepages. >>> + Can be used multiple times for multiple anon THP sizes. >>> + See Documentation/admin-guide/mm/transhuge.rst for more >>> + details. >>> + >>> threadirqs [KNL,EARLY] >>> Force threading of all interrupt handlers except those >>> marked explicitly IRQF_NO_THREAD. >>> diff --git a/Documentation/admin-guide/mm/transhuge.rst b/Documentation/admin-guide/mm/transhuge.rst >>> index 1aaf8e3a0b5a..f53d43d986e2 100644 >>> --- a/Documentation/admin-guide/mm/transhuge.rst >>> +++ b/Documentation/admin-guide/mm/transhuge.rst >>> @@ -311,13 +311,27 @@ performance. >>> Note that any changes to the allowed set of sizes only applies to future >>> file-backed THP allocations. >>> >>> -Boot parameter >>> -============== >>> +Boot parameters >>> +=============== >>> >>> -You can change the sysfs boot time defaults of Transparent Hugepage >>> -Support by passing the parameter ``transparent_hugepage=always`` or >>> -``transparent_hugepage=madvise`` or ``transparent_hugepage=never`` >>> -to the kernel command line. >>> +You can change the sysfs boot time default for the top-level "enabled" >>> +control by passing the parameter ``transparent_hugepage=always`` or >>> +``transparent_hugepage=madvise`` or ``transparent_hugepage=never`` to the >>> +kernel command line. >>> + >>> +Alternatively, each supported anonymous THP size can be controlled by >>> +passing ``thp_anon=[KMG]:``, where ```` is the THP size >>> +and ```` is one of ``always``, ``madvise``, ``never`` or >>> +``inherit``. >>> + >>> +For example, the following will set 64K THP to ``always``:: >>> + >>> + thp_anon=64K:always >>> + >>> +``thp_anon=`` may be specified multiple times to configure all THP sizes as >>> +required. If ``thp_anon=`` is specified at least once, any anon THP sizes >>> +not explicitly configured on the command line are implicitly set to >>> +``never``. >>> >>> Hugepages in tmpfs/shmem >>> ======================== >>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c >>> index 4249c0bc9388..794d2790d90d 100644 >>> --- a/mm/huge_memory.c >>> +++ b/mm/huge_memory.c >>> @@ -82,6 +82,7 @@ unsigned long huge_anon_orders_madvise __read_mostly; >>> unsigned long huge_anon_orders_inherit __read_mostly; >>> unsigned long huge_file_orders_always __read_mostly; >>> int huge_file_exec_order __read_mostly = -1; >>> +static bool anon_orders_configured; >>> >>> unsigned long __thp_vma_allowable_orders(struct vm_area_struct *vma, >>> unsigned long vm_flags, >>> @@ -763,7 +764,10 @@ static int __init hugepage_init_sysfs(struct kobject **hugepage_kobj) >>> * disable all other sizes. powerpc's PMD_ORDER isn't a compile-time >>> * constant so we have to do this here. >>> */ >>> - huge_anon_orders_inherit = BIT(PMD_ORDER); >>> + if (!anon_orders_configured) { >>> + huge_anon_orders_inherit = BIT(PMD_ORDER); >> >> PMD_ORDER for 64k base PS systems would result in a 512M value, which exceeds >> the xarray limit [1]. Therefore, I think we need to avoid PMD-size orders by >> checking if PMD_ORDER > MAX_PAGECACHE_ORDER. > > This is for anon memory, which isn't installed in the page cache so its > independent of MAX_PAGECACHE_ORDER. I don't believe there is a problem here. > >> >> [1] https://lore.kernel.org/all/20240627003953.1262512-1-gshan@redhat.com/ >> >>> + anon_orders_configured = true; >>> + } >>> >>> /* >>> * For pagecache, default to enabling all orders. powerpc's PMD_ORDER >>> @@ -955,6 +959,55 @@ static int __init setup_transparent_hugepage(char *str) >>> } >>> __setup("transparent_hugepage=", setup_transparent_hugepage); >>> >>> +static int __init setup_thp_anon(char *str) >>> +{ >>> + unsigned long size; >>> + char *state; >>> + int order; >>> + int ret = 0; >>> + >>> + if (!str) >>> + goto out; >>> + >>> + size = (unsigned long)memparse(str, &state); >>> + order = ilog2(size >> PAGE_SHIFT); >>> + if (*state != ':' || !is_power_of_2(size) || size <= PAGE_SIZE || >>> + !(BIT(order) & THP_ORDERS_ALL_ANON)) >>> + goto out; >>> + >>> + state++; >>> + >>> + if (!strcmp(state, "always")) { >>> + clear_bit(order, &huge_anon_orders_inherit); >>> + clear_bit(order, &huge_anon_orders_madvise); >>> + set_bit(order, &huge_anon_orders_always); >>> + ret = 1; >>> + } else if (!strcmp(state, "inherit")) { >>> + clear_bit(order, &huge_anon_orders_always); >>> + clear_bit(order, &huge_anon_orders_madvise); >>> + set_bit(order, &huge_anon_orders_inherit); >>> + ret = 1; >>> + } else if (!strcmp(state, "madvise")) { >>> + clear_bit(order, &huge_anon_orders_always); >>> + clear_bit(order, &huge_anon_orders_inherit); >>> + set_bit(order, &huge_anon_orders_madvise); >>> + ret = 1; >>> + } else if (!strcmp(state, "never")) { >>> + clear_bit(order, &huge_anon_orders_always); >>> + clear_bit(order, &huge_anon_orders_inherit); >>> + clear_bit(order, &huge_anon_orders_madvise); >>> + ret = 1; >>> + } >>> + >>> + if (ret) >>> + anon_orders_configured = true; >>> +out: >>> + if (!ret) >>> + pr_warn("thp_anon=%s: cannot parse, ignored\n", str); >>> + return ret; >>> +} >>> +__setup("thp_anon=", setup_thp_anon); >>> + >>> pmd_t maybe_pmd_mkwrite(pmd_t pmd, struct vm_area_struct *vma) >>> { >>> if (likely(vma->vm_flags & VM_WRITE)) >>> -- >>> 2.43.0 >