From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8415FC52D71 for ; Fri, 9 Aug 2024 09:04:31 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 146BE6B0088; Fri, 9 Aug 2024 05:04:31 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 0D0206B0089; Fri, 9 Aug 2024 05:04:31 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E8C8D6B008A; Fri, 9 Aug 2024 05:04:30 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id C24AE6B0088 for ; Fri, 9 Aug 2024 05:04:30 -0400 (EDT) Received: from smtpin04.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 6C59B1C41A4 for ; Fri, 9 Aug 2024 09:04:30 +0000 (UTC) X-FDA: 82432121100.04.B001348 Received: from mail-ua1-f53.google.com (mail-ua1-f53.google.com [209.85.222.53]) by imf21.hostedemail.com (Postfix) with ESMTP id 9B2231C0004 for ; Fri, 9 Aug 2024 09:04:28 +0000 (UTC) Authentication-Results: imf21.hostedemail.com; dkim=none; spf=pass (imf21.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.222.53 as permitted sender) smtp.mailfrom=21cnbao@gmail.com; dmarc=fail reason="SPF not aligned (relaxed), No valid DKIM" header.from=kernel.org (policy=none) ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1723194181; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=wQgvN8x7zaxI0q1kLK2PPMTGN/QVO8NMPgeZcIw1CHk=; b=fGRsJbJzWiBm+CU9SAduycBHkFCGOxBaHzl4mNSFmzaH/9S2Pf8J0edRJwQMD7A43CHhcs vSf42QAa5UQitDgk4vVcOMhRjUkrLX6GfyFwKzfsZ6EUV4T+CaMDn3uztC9aeDMXJWx88D ndFs0aWPdbpaQS7fUY2NTzE0+cmMZLw= ARC-Authentication-Results: i=1; imf21.hostedemail.com; dkim=none; spf=pass (imf21.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.222.53 as permitted sender) smtp.mailfrom=21cnbao@gmail.com; dmarc=fail reason="SPF not aligned (relaxed), No valid DKIM" header.from=kernel.org (policy=none) ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1723194181; a=rsa-sha256; cv=none; b=iq2xikUaU31zdx8LYfKBD1eTwHp0Rj0xCHGDK6Wy2ATo0Rxe+w6vm6ZRxjgDEOBPYwn2S+ WKogqJXLI4ryjEl0iklmp5iANYvFM/76CTrxbtSaFYFujbg9jLYFXRYeRsIXo43CtaI8f8 R9a3ktxXXQ0xm4DZl0d5/+oAdCekV38= Received: by mail-ua1-f53.google.com with SMTP id a1e0cc1a2514c-824ae03efbfso487293241.3 for ; Fri, 09 Aug 2024 02:04:28 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1723194268; x=1723799068; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=wQgvN8x7zaxI0q1kLK2PPMTGN/QVO8NMPgeZcIw1CHk=; b=Uvgr77aFIdctv+6JXl+BvNNRZGHcjc3Dkdvi1twwaXL9gJtg48vLwV/YDu809cJGpu iDalTV3MreJqrffqCAPS7C51soyy/VZYotxklbDzOmvsgWlrfkSPkSKuGuooosGmGj1L 3ZalRD63QJS5U+3UP7hjiEKKo32mu14UvWqzHHPkValk0hkpmxGmj+cdpurSq4vgXbhE N6yt9jgRFjSVX08bH8QuDm8/qtG67zMDIWZxTaFvH2OLOdY9Y6Ni+EFp9/aSufKMC2ng 5TdPl/oN5eStgMdeQaQj+njZNxOBbeoZNNZXXXBop/1hONuV7WRwA9Hw9grVpuD1dDBK h8TA== X-Forwarded-Encrypted: i=1; AJvYcCVfMjS2KIJz1aund8qmDd7pVyQFLGDWwS3fO7Ch0yx0R8Z6YCI9MklyNk5d/OxrDiZo7qqs4lJMnchiHhhcpUwAyuo= X-Gm-Message-State: AOJu0YyOpHPtgMzJF+sZRU5kkrmcBivQuJGTS1PPFJblqS1xuHky4IK5 2IbiTRtp68fyUb5O0fKIrV4rI/rJEo2BsQNZW5A1qxgTHbcGTXtwK31b4xMagHKMWuuJZqkZc6S 8dumfpMAHGfaxAKG5LUd6YaF5L/o= X-Google-Smtp-Source: AGHT+IGnvkc9o05KWmhisA2GJLS04VxK9exbouzBsblXB2JOEPLt84fg2cV6LN2kDwyDW8t6Fdg8gVC5A5C1Y2xnUQI= X-Received: by 2002:a05:6122:470b:b0:4ed:145:348f with SMTP id 71dfb90a1353d-4f912ee0aafmr1124697e0c.12.1723194267473; Fri, 09 Aug 2024 02:04:27 -0700 (PDT) MIME-Version: 1.0 References: <20240808101700.571701-1-ryan.roberts@arm.com> In-Reply-To: From: Barry Song Date: Fri, 9 Aug 2024 17:04:15 +0800 Message-ID: Subject: Re: [PATCH v2] mm: Override mTHP "enabled" defaults at kernel cmdline To: Ryan Roberts Cc: Andrew Morton , Jonathan Corbet , David Hildenbrand , Lance Yang , Baolin Wang , linux-kernel@vger.kernel.org, linux-mm@kvack.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 9B2231C0004 X-Stat-Signature: g7zdddjmgk6a94ikeojspe9h4hdoais6 X-Rspam-User: X-HE-Tag: 1723194268-216038 X-HE-Meta: U2FsdGVkX1/0Y/fnA86IplF63eRffT5RWNZy77+eiU7JRvHFIwbrvCzam1GRmr3CHt+2RrYR05u6WB4hKiwN70cqVt4JhaZHa8FsCaZ7YXXcUknKppZinWp+brkl53zjwarfI4uNCcN8eQk1k06xiyTM2+w6QzEAJyd1BoKtH2vEldSmLlFeFGcEP04w6MKa9Yr8jQSmlYjVTY/f+KlNiQh8rzn+DMgQ2oNlBNDmweyYl9SqfFJcITHK+jdf4manYWSDe5WXJfC0LCWVd/kdFzZX3i/4ay70AsMy34EgFN6EIhLERCj40JED/Xic/ZpS2GRm6Y1YSy/a0ARTkw6qVoy7rOorahhSIiaROHshhMApPTMQq3UnxvAT7BfZS+JtPruTUh+Fa8K/rqIt0ZOW2/KgIwZ5nezt2A3YehLNtnfjieWSZA8o8BMHlIbZIrcdCh8pYAtzR6Kfu4TqQc7A0VeSelbxiti628CPdRTpOh/uuN+aFduA7rQELPWFYrz0r3+HPBMV8Yn4GvxdfbnMOCYAnk5xSshnm2MxIBM+p2Tep+zeF63FzMoHZk43E5/iUwkKaXj8fMf0oqlAJ1rEOVPI3gHUhFlsliEoEFjemwVM8jGqAqDc6IV1+GZ33D1F5rMikRYmnwYWAnWCHxK4dNPrwJiRvA/xk1MmSHIeozIOofZmINe4od1py0ZkpkdpTbmL5NuC/zyJd6BWRNTJN4R7l7LA38Aj1lSL4cbxdbX0WelK3VenuAiUqo6l6o+xV8R1/Wsw3EqofbsgtENp0fI83G4jasdPxPiKa6+oLsxaUXzOEDsorKcscpdDj3BlAN37ieoP0qZ9pOrG0m/Wa7y8EuaAlvSJuNUfgoFan18LB6qlIq8LUlTCK4W+X5FJn/Jr1UDQu+f60zvOY1vtOe0Yd8KcRR9gvO4YIf/5KamQY36tarBBc5p9pef9hTUNFpmkx8WKFdkAwSHApQW Kh4L58DH EbX66j8FEkj8FYdZDm6U6MMT46SenuYgn9GutfZJdG9qJp4iHSQG3B22QdFbBhcSFiUGjUfk8ZYFPk1xpwXNzfG9uzO5Luvp5kCvA8ZNwYzRjNriFnjBOw0Hn4J/bzU4PKf6zrIPWgA6FbNSEQz/iyOQqZrg9ob+HPUrnYHRoxNnmLeGeyYB+0pEJbVxZmQAHC1dpTmCY8/5e2MnP6bcw9XKfzlINZZbM7ZIfTmaPSnYXHsqMEHSeLohAx/XB8LXbO5PyVUd2WYj2hiv9ZV7Ex+moeLyRL8lsEmr9Mn9aXPC5Pmg4A0OWp6RjRaf7gZJttFlDVckMEv+z9MH/EyXAmO3J6ObiYIgIhuEGSOpy5bbS5g/q/ht/KKTA9kDwAEcODaGEkXXPicjcy5Y= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Fri, Aug 9, 2024 at 4:57=E2=80=AFPM Ryan Roberts = wrote: > > On 09/08/2024 09:31, Barry Song wrote: > > On Fri, Aug 9, 2024 at 3:58=E2=80=AFPM Ryan Roberts wrote: > >> > >> On 08/08/2024 22:17, Barry Song wrote: > >>> On Thu, Aug 8, 2024 at 10:17=E2=80=AFPM Ryan Roberts wrote: > >>>> > >>>> Add thp_anon=3D cmdline parameter to allow specifying the default > >>>> enablement of each supported anon THP size. The parameter accepts th= e > >>>> following format and can be provided multiple times to configure eac= h > >>>> size: > >>>> > >>>> thp_anon=3D[KMG]: > >>>> > >>>> See Documentation/admin-guide/mm/transhuge.rst for more details. > >>>> > >>>> Configuring the defaults at boot time is useful to allow early user > >>>> space to take advantage of mTHP before its been configured through > >>>> sysfs. > >>>> > >>>> Signed-off-by: Ryan Roberts > >>>> --- > >>>> > >>>> Hi All, > >>>> > >>>> I've split this off from my RFC at [1] because Barry highlighted tha= t he would > >>>> benefit from it immediately [2]. There are no changes vs the version= in that > >>>> series. > >>>> > >>>> It applies against today's mm-unstable (275d686abcb59). (although I = had to fix a > >>>> minor build bug in stackdepot.c due to MIN() not being defined in th= is tree). > >>>> > >>>> Thanks, > >>>> Ryan > >>>> > >>>> > >>>> .../admin-guide/kernel-parameters.txt | 8 +++ > >>>> Documentation/admin-guide/mm/transhuge.rst | 26 +++++++-- > >>>> mm/huge_memory.c | 55 ++++++++++++++++= ++- > >>>> 3 files changed, 82 insertions(+), 7 deletions(-) > >>>> > >>>> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Docum= entation/admin-guide/kernel-parameters.txt > >>>> index bcdee8984e1f0..5c79b58c108ec 100644 > >>>> --- a/Documentation/admin-guide/kernel-parameters.txt > >>>> +++ b/Documentation/admin-guide/kernel-parameters.txt > >>>> @@ -6631,6 +6631,14 @@ > >>>> : poll all this frequency > >>>> 0: no polling (default) > >>>> > >>>> + thp_anon=3D [KNL] > >>>> + Format: [KMG]:always|madvise|never|inh= erit > >>>> + Can be used to control the default behavior = of the > >>>> + system with respect to anonymous transparent= hugepages. > >>>> + Can be used multiple times for multiple anon= THP sizes. > >>>> + See Documentation/admin-guide/mm/transhuge.r= st for more > >>>> + details. > >>>> + > >>>> threadirqs [KNL,EARLY] > >>>> Force threading of all interrupt handlers ex= cept those > >>>> marked explicitly IRQF_NO_THREAD. > >>>> diff --git a/Documentation/admin-guide/mm/transhuge.rst b/Documentat= ion/admin-guide/mm/transhuge.rst > >>>> index 24eec1c03ad88..f63b0717366c6 100644 > >>>> --- a/Documentation/admin-guide/mm/transhuge.rst > >>>> +++ b/Documentation/admin-guide/mm/transhuge.rst > >>>> @@ -284,13 +284,27 @@ that THP is shared. Exceeding the number would= block the collapse:: > >>>> > >>>> A higher value may increase memory footprint for some workloads. > >>>> > >>>> -Boot parameter > >>>> -=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > >>>> +Boot parameters > >>>> +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > >>>> > >>>> -You can change the sysfs boot time defaults of Transparent Hugepage > >>>> -Support by passing the parameter ``transparent_hugepage=3Dalways`` = or > >>>> -``transparent_hugepage=3Dmadvise`` or ``transparent_hugepage=3Dneve= r`` > >>>> -to the kernel command line. > >>>> +You can change the sysfs boot time default for the top-level "enabl= ed" > >>>> +control by passing the parameter ``transparent_hugepage=3Dalways`` = or > >>>> +``transparent_hugepage=3Dmadvise`` or ``transparent_hugepage=3Dneve= r`` to the > >>>> +kernel command line. > >>>> + > >>>> +Alternatively, each supported anonymous THP size can be controlled = by > >>>> +passing ``thp_anon=3D[KMG]:``, where ```` is the= THP size > >>>> +and ```` is one of ``always``, ``madvise``, ``never`` or > >>>> +``inherit``. > >>>> + > >>>> +For example, the following will set 64K THP to ``always``:: > >>>> + > >>>> + thp_anon=3D64K:always > >>>> + > >>>> +``thp_anon=3D`` may be specified multiple times to configure all TH= P sizes as > >>>> +required. If ``thp_anon=3D`` is specified at least once, any anon T= HP sizes > >>>> +not explicitly configured on the command line are implicitly set to > >>>> +``never``. > >>>> > >>>> Hugepages in tmpfs/shmem > >>>> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D > >>>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c > >>>> index 0c3075ee00012..c2c0da1eb94e6 100644 > >>>> --- a/mm/huge_memory.c > >>>> +++ b/mm/huge_memory.c > >>>> @@ -82,6 +82,7 @@ unsigned long huge_zero_pfn __read_mostly =3D ~0UL= ; > >>>> unsigned long huge_anon_orders_always __read_mostly; > >>>> unsigned long huge_anon_orders_madvise __read_mostly; > >>>> unsigned long huge_anon_orders_inherit __read_mostly; > >>>> +static bool anon_orders_configured; > >>>> > >>>> unsigned long __thp_vma_allowable_orders(struct vm_area_struct *vma= , > >>>> unsigned long vm_flags, > >>>> @@ -672,7 +673,10 @@ static int __init hugepage_init_sysfs(struct ko= bject **hugepage_kobj) > >>>> * disable all other sizes. powerpc's PMD_ORDER isn't a comp= ile-time > >>>> * constant so we have to do this here. > >>>> */ > >>>> - huge_anon_orders_inherit =3D BIT(PMD_ORDER); > >>>> + if (!anon_orders_configured) { > >>>> + huge_anon_orders_inherit =3D BIT(PMD_ORDER); > >>>> + anon_orders_configured =3D true; > >>>> + } > >>> > >>> If a user configures 64KB and doesn't adjust anything for PMD_ORDER, > >>> then PMD_ORDER will be set to "never", correct? This seems to change > >>> the default behavior of PMD_ORDER. Could we instead achieve this by > >>> checking if PMD_ORDER has been explicitly configured? > >> > >> Yes, that's how it's implemented in this patch, and the accompanying d= ocs also > >> state: > >> > >> If ``thp_anon=3D`` is specified at least once, any anon THP sizes > >> not explicitly configured on the command line are implicitly set to > >> ``never``. > >> > >> My initial approach did exactly as you suggest. But in the original se= ries, I > >> also had a similar patch to configure file thp with "thp_file=3D". And= for file, > >> all of the orders default to `always`. So if taking the same approach = with that > >> control, the user would have to explicitly opt-out of all supported or= ders > >> rather than just opt-in to the orders they want. And I thought that co= uld get > >> tricky in future if support is added for more orders. I felt that was > >> potentially very confusing so decided it was clearer to have the above= rule and > >> make both controls consistent. > >> > >> What do you think? > > > > If this is the intention, once the user sets the command line, they sho= uld > > realize that the default settings have been overridden. I am perfectly = fine > > with this strategy. > > OK. I can see it from both sides to be honest. Let's see if anyone else h= as > issue with this approach. > > > > > with the below cmdline: > > thp_anon=3D64K:always thp_anon=3D8K:inherit thp_anon=3D32K:madvise > > thp_anon=3D1M:inherit thp_anon=3D2M:always > > > > I am getting: > > / # cat /sys/kernel/mm/transparent_hugepage/hugepages-64kB/enabled > > [always] inherit madvise never > > / # cat /sys/kernel/mm/transparent_hugepage/hugepages-32kB/enabled > > always inherit [madvise] never > > / # cat /sys/kernel/mm/transparent_hugepage/hugepages-1024kB/enabled > > always [inherit] madvise never > > / # cat /sys/kernel/mm/transparent_hugepage/hugepages-2048kB/enabled > > [always] inherit madvise never > > And you should also be seeing a warning in the boot log that thp_anon=3D8= K:inherit > is unrecognised (since anon doesn't support order-1). Indeed. I am even seeing "Unknown kernel command line parameters "thp_anon=3D8K:inherit", will be passed to user space." [ 0.000000] huge_memory: thp_anon=3D8K:inherit: cannot parse, ignored [ 0.000000] Unknown kernel command line parameters "thp_anon=3D8K:inherit", will be passed to user space. [ 0.000000] random: crng init done [ 0.000000] Dentry cache hash table entries: 65536 (order: 7, 524288 bytes, linear) [ 0.000000] Inode-cache hash table entries: 32768 (order: 6, 262144 bytes, linear) [ 0.000000] Fallback order for Node 0: 0 [ 0.000000] Built 1 zonelists, mobility grouping on. Total pages: 13107= 2 [ 0.000000] Policy zone: DMA > > > > > Thus, > > > > Tested-by: Barry Song > > Thanks! > > > > >> > >> > >>> > >>>> > >>>> *hugepage_kobj =3D kobject_create_and_add("transparent_hugep= age", mm_kobj); > >>>> if (unlikely(!*hugepage_kobj)) { > >>>> @@ -857,6 +861,55 @@ static int __init setup_transparent_hugepage(ch= ar *str) > >>>> } > >>>> __setup("transparent_hugepage=3D", setup_transparent_hugepage); > >>>> > >>>> +static int __init setup_thp_anon(char *str) > >>>> +{ > >>>> + unsigned long size; > >>>> + char *state; > >>>> + int order; > >>>> + int ret =3D 0; > >>>> + > >>>> + if (!str) > >>>> + goto out; > >>>> + > >>>> + size =3D (unsigned long)memparse(str, &state); > >>>> + order =3D ilog2(size >> PAGE_SHIFT); > >>>> + if (*state !=3D ':' || !is_power_of_2(size) || size <=3D PAG= E_SIZE || > >>>> + !(BIT(order) & THP_ORDERS_ALL_ANON)) > >>>> + goto out; > >>>> + > >>>> + state++; > >>>> + > >>>> + if (!strcmp(state, "always")) { > >>>> + clear_bit(order, &huge_anon_orders_inherit); > >>>> + clear_bit(order, &huge_anon_orders_madvise); > >>>> + set_bit(order, &huge_anon_orders_always); > >>>> + ret =3D 1; > >>>> + } else if (!strcmp(state, "inherit")) { > >>>> + clear_bit(order, &huge_anon_orders_always); > >>>> + clear_bit(order, &huge_anon_orders_madvise); > >>>> + set_bit(order, &huge_anon_orders_inherit); > >>>> + ret =3D 1; > >>>> + } else if (!strcmp(state, "madvise")) { > >>>> + clear_bit(order, &huge_anon_orders_always); > >>>> + clear_bit(order, &huge_anon_orders_inherit); > >>>> + set_bit(order, &huge_anon_orders_madvise); > >>>> + ret =3D 1; > >>>> + } else if (!strcmp(state, "never")) { > >>>> + clear_bit(order, &huge_anon_orders_always); > >>>> + clear_bit(order, &huge_anon_orders_inherit); > >>>> + clear_bit(order, &huge_anon_orders_madvise); > >>>> + ret =3D 1; > >>>> + } > >>>> + > >>>> + if (ret) > >>>> + anon_orders_configured =3D true; > >>> > >>> I mean: > >>> > >>> if (ret && order =3D=3D PMD_ORDER) > >>> anon_pmd_order_configured =3D true; > >>> > >>>> +out: > >>>> + if (!ret) > >>>> + pr_warn("thp_anon=3D%s: cannot parse, ignored\n", st= r); > >>>> + return ret; > >>>> +} > >>>> +__setup("thp_anon=3D", setup_thp_anon); > >>>> + > >>>> pmd_t maybe_pmd_mkwrite(pmd_t pmd, struct vm_area_struct *vma) > >>>> { > >>>> if (likely(vma->vm_flags & VM_WRITE)) > >>>> -- > >>>> 2.43.0 > >>>> > >>> > >>> Thanks > >>> Barry > >> > >