From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 622AEC3DA4A for ; Fri, 9 Aug 2024 08:31:26 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E976E6B0095; Fri, 9 Aug 2024 04:31:25 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id E47BD6B0098; Fri, 9 Aug 2024 04:31:25 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id CE8076B009A; Fri, 9 Aug 2024 04:31:25 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id B06F66B0095 for ; Fri, 9 Aug 2024 04:31:25 -0400 (EDT) Received: from smtpin22.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 370691C4206 for ; Fri, 9 Aug 2024 08:31:25 +0000 (UTC) X-FDA: 82432037730.22.F9661DB Received: from mail-vk1-f176.google.com (mail-vk1-f176.google.com [209.85.221.176]) by imf06.hostedemail.com (Postfix) with ESMTP id 5C44E18002F for ; Fri, 9 Aug 2024 08:31:23 +0000 (UTC) Authentication-Results: imf06.hostedemail.com; dkim=none; dmarc=fail reason="SPF not aligned (relaxed), No valid DKIM" header.from=kernel.org (policy=none); spf=pass (imf06.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.221.176 as permitted sender) smtp.mailfrom=21cnbao@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1723192273; a=rsa-sha256; cv=none; b=dneZFi+qA7CDznGlef9fEUgjRNh5jiQAtCGh64abfpQyMy+bzBnyOCoP7R8YYghWpI6i95 l4A9UMAreESWRexl+KJxsJEkVdKgyvZzEl+d+ksG0tB53vx7sm8kk1FzD7iViQUw8BYlMJ dEBR0eSgDlDrV1fDYNHsRUviKMQ9RaY= ARC-Authentication-Results: i=1; imf06.hostedemail.com; dkim=none; dmarc=fail reason="SPF not aligned (relaxed), No valid DKIM" header.from=kernel.org (policy=none); spf=pass (imf06.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.221.176 as permitted sender) smtp.mailfrom=21cnbao@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1723192273; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=CZK40ZjLkKvk1lhIvy8UiH7XWxOMBEix3uHnz9d9mU0=; b=6eIgtyB1AkkxmzzYBVEWO+tHApaWJ6fXhHdUjI8WHKGUPGNxjzfxNwzEQnKRqafZS/Ei07 Rl6G9UPJDQhJhxmtQQgkNRIvX5df8osgXefpuDOGMbzAvG535118uI6AZ3gAxB/mOn5z6H oa8qebiR/eMud+wllCTEvyVuq97/WEU= Received: by mail-vk1-f176.google.com with SMTP id 71dfb90a1353d-4f51c1f9372so696737e0c.2 for ; Fri, 09 Aug 2024 01:31:23 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1723192282; x=1723797082; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=CZK40ZjLkKvk1lhIvy8UiH7XWxOMBEix3uHnz9d9mU0=; b=f2dzx/mwvHSawKyeQd8FTkQGP1EOgOxDNGeQBLBWgEFuCa63HE1/9PpZzOlln8ik3S 2Nl2wKtfXN/SSEneJuvTR61vJrfrAtcFuGRdymWlKxeiWtEVoZUbDPGlG3dGQOCGSbdw WdtE19nfSZhr6Mqbl/jdRbfpumcqB9OkC/m0CWmYMQdk4XfrN8FpjGgEnMrvmBbKFJcw wKGhJq7nAK6HABglg/OxYkO96Qe/GmHjCoMRQtHZ1TgefqB3+LVNYfyhsi/zdr2BQlKx O9WgDc7p2nVJRfgG2zepVI95yXBLzecntRMUBqXs3MLaJUWKSZ/1CXdsz4ujBtgIBtCn y4KQ== X-Forwarded-Encrypted: i=1; AJvYcCV0x+Ezc9LcyXAvLxAwgmJ6Mvm8bJX+8iKbC32gPQVAe7c2SahU0Jj7OmF3rlXpBzxOe430B1e/ZvGJO9hzcdRn+FE= X-Gm-Message-State: AOJu0YxfsVg0m8F7If4DsSnZssZRNPshv1VzVzGirnbc31DbRcIhXfN1 eOhVcN9bqU4d67hH3ro08Jo5mKM5TKz/PIvTjCf8FQEzX9A4PUsH5bLNrfBpp9byrTiUgMtxo/S Gwx8RYW01DSiS/pQpGU38fxi0i2c= X-Google-Smtp-Source: AGHT+IGwGBq4Ll1rOYyofTagZ08OgmVLUQt5o3URGGPtwKZRwWhaiWTeVq8rh4xlhLxfEkp9pCYN8g8pllaIpmpqe44= X-Received: by 2002:a05:6122:21ac:b0:4ed:12b:ec99 with SMTP id 71dfb90a1353d-4f912bc752bmr981548e0c.3.1723192282219; Fri, 09 Aug 2024 01:31:22 -0700 (PDT) MIME-Version: 1.0 References: <20240808101700.571701-1-ryan.roberts@arm.com> In-Reply-To: From: Barry Song Date: Fri, 9 Aug 2024 16:31:10 +0800 Message-ID: Subject: Re: [PATCH v2] mm: Override mTHP "enabled" defaults at kernel cmdline To: Ryan Roberts Cc: Andrew Morton , Jonathan Corbet , David Hildenbrand , Lance Yang , Baolin Wang , linux-kernel@vger.kernel.org, linux-mm@kvack.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Rspamd-Queue-Id: 5C44E18002F X-Rspamd-Server: rspam01 X-Stat-Signature: gwgey119fc5momj1uycenyrzwc38fs9w X-HE-Tag: 1723192283-160204 X-HE-Meta: U2FsdGVkX1/GFL1ecdHxOIzPhFIvWcKtVwvs36tdDIxpUTN790Vj2yAPcJlTwZQfPSR8CQCtnzuG8ypcrzQT8Dj02W8lBDYD3SKrV3uXQghu4aIFZ29QQnGMz0O6nuMubswmoHb2C9Zss8OcUAgDgqZTOupDzwi9/EQwq9d2S0jGwY5nPPwL1wvK0u6UUHnLRIVQVskCm+HTodQbPYio9P91kXyxhcFFRVw6bmbWup721TKd51HfzxgDBOFI6ZXsr4mNf5Lg97AchdURKEc2O+9mhAyK5YwrHHLwUYs35/qz+keCTn9m/aAJJcM7zO+/jq1l143KuXZaiX6khr4fxTrmkr2ICIEbR3T8P7xG+neObNx99Iz8O7QQmmbPsHdATLwmiQqUuNYXrqjZLUwk+5h8+kSLfKY0cV7DDtyfGjIiZtBDCa0wSHw3/qiXvAT0P1M3KihEhfXenFIF0n1HiPnqlidavxMZ45xLSaW7d82SdcLQsgULCciEVehVSaTcKr2AuwOdnXwM6iWqv8JuqrDoF8aQm68zKtQu5g0qbBkRfXxXEz36sLeGMA6OD+8dKqH6G2iOqR5ln7naAjCxxrP8ktKxiED3I5m8T6f1MnS2IzKr6RwTFwKkT11SPupRP5dIf0D4NV/NGhmlooNxoFcTVSQz1yeECY9da+88Fbigz+qITvvN4JuUEUFOOQFf4vVotRX/+DMO6xyon9xBHQvcadmbIRBxvBC0U/1A8GCpX0ekgi62NftwFZaceA2Z76X0kp0srHEYcVYtkLegHIp3OMEnOkWkxH0N8JT4srRq1jEH5uXJkXC6I+Rj3JPED8Y4Agabq7gOmZR9shHI4y5DbTumt7AgRK6PFTpnx5iT8wugPjSLmdav9mIHR/lPFxAG7gRWj73vMf1HL1DlAacyBAgCQ6iHx3pW/Ziuui6ndDdSrtlZDg0t51vWKlTKftZ8+anqAONpvVlgfrU 1HbVVyvf mZ0DLZw0727SiaenWlCjDId+9n0eT5W34afFW/5faMOQjGlgFfZdI5Dip4E9AW3dHrBErfhaER8r08o9Fu/DdCqOUevvPt6+kwDshacui4uXYwi2cBNqXK7TXYem/W3O022u+vGsVBIoWTnKvRpnZ7BnFwaJTxrvrWqeHQ5cvpgr7Almv4MMZvTCFu9daHrhVK7Ef18p4uyrFm2VqJpVyh/NoXTlCcAKNXTZXhOxv3Cfn0XkiGkEkJ5rirhOKAvYrb+1B2dlLNqXMLCrD34MwHeGDK1U0vzGkp/zFuWt1MU5RY+yG+CaBorRoe+uKyWpWAq4XS2fipQm6a0U3bKhTvij9UpZxBv6It52QQv/7H0VECgTky+1TOdwoNY/b6mOA2I4FgoSxqWdV+sw= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Fri, Aug 9, 2024 at 3:58=E2=80=AFPM Ryan Roberts = wrote: > > On 08/08/2024 22:17, Barry Song wrote: > > On Thu, Aug 8, 2024 at 10:17=E2=80=AFPM Ryan Roberts wrote: > >> > >> Add thp_anon=3D cmdline parameter to allow specifying the default > >> enablement of each supported anon THP size. The parameter accepts the > >> following format and can be provided multiple times to configure each > >> size: > >> > >> thp_anon=3D[KMG]: > >> > >> See Documentation/admin-guide/mm/transhuge.rst for more details. > >> > >> Configuring the defaults at boot time is useful to allow early user > >> space to take advantage of mTHP before its been configured through > >> sysfs. > >> > >> Signed-off-by: Ryan Roberts > >> --- > >> > >> Hi All, > >> > >> I've split this off from my RFC at [1] because Barry highlighted that = he would > >> benefit from it immediately [2]. There are no changes vs the version i= n that > >> series. > >> > >> It applies against today's mm-unstable (275d686abcb59). (although I ha= d to fix a > >> minor build bug in stackdepot.c due to MIN() not being defined in this= tree). > >> > >> Thanks, > >> Ryan > >> > >> > >> .../admin-guide/kernel-parameters.txt | 8 +++ > >> Documentation/admin-guide/mm/transhuge.rst | 26 +++++++-- > >> mm/huge_memory.c | 55 ++++++++++++++++++= - > >> 3 files changed, 82 insertions(+), 7 deletions(-) > >> > >> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documen= tation/admin-guide/kernel-parameters.txt > >> index bcdee8984e1f0..5c79b58c108ec 100644 > >> --- a/Documentation/admin-guide/kernel-parameters.txt > >> +++ b/Documentation/admin-guide/kernel-parameters.txt > >> @@ -6631,6 +6631,14 @@ > >> : poll all this frequency > >> 0: no polling (default) > >> > >> + thp_anon=3D [KNL] > >> + Format: [KMG]:always|madvise|never|inher= it > >> + Can be used to control the default behavior of= the > >> + system with respect to anonymous transparent h= ugepages. > >> + Can be used multiple times for multiple anon T= HP sizes. > >> + See Documentation/admin-guide/mm/transhuge.rst= for more > >> + details. > >> + > >> threadirqs [KNL,EARLY] > >> Force threading of all interrupt handlers exce= pt those > >> marked explicitly IRQF_NO_THREAD. > >> diff --git a/Documentation/admin-guide/mm/transhuge.rst b/Documentatio= n/admin-guide/mm/transhuge.rst > >> index 24eec1c03ad88..f63b0717366c6 100644 > >> --- a/Documentation/admin-guide/mm/transhuge.rst > >> +++ b/Documentation/admin-guide/mm/transhuge.rst > >> @@ -284,13 +284,27 @@ that THP is shared. Exceeding the number would b= lock the collapse:: > >> > >> A higher value may increase memory footprint for some workloads. > >> > >> -Boot parameter > >> -=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > >> +Boot parameters > >> +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > >> > >> -You can change the sysfs boot time defaults of Transparent Hugepage > >> -Support by passing the parameter ``transparent_hugepage=3Dalways`` or > >> -``transparent_hugepage=3Dmadvise`` or ``transparent_hugepage=3Dnever`= ` > >> -to the kernel command line. > >> +You can change the sysfs boot time default for the top-level "enabled= " > >> +control by passing the parameter ``transparent_hugepage=3Dalways`` or > >> +``transparent_hugepage=3Dmadvise`` or ``transparent_hugepage=3Dnever`= ` to the > >> +kernel command line. > >> + > >> +Alternatively, each supported anonymous THP size can be controlled by > >> +passing ``thp_anon=3D[KMG]:``, where ```` is the T= HP size > >> +and ```` is one of ``always``, ``madvise``, ``never`` or > >> +``inherit``. > >> + > >> +For example, the following will set 64K THP to ``always``:: > >> + > >> + thp_anon=3D64K:always > >> + > >> +``thp_anon=3D`` may be specified multiple times to configure all THP = sizes as > >> +required. If ``thp_anon=3D`` is specified at least once, any anon THP= sizes > >> +not explicitly configured on the command line are implicitly set to > >> +``never``. > >> > >> Hugepages in tmpfs/shmem > >> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D > >> diff --git a/mm/huge_memory.c b/mm/huge_memory.c > >> index 0c3075ee00012..c2c0da1eb94e6 100644 > >> --- a/mm/huge_memory.c > >> +++ b/mm/huge_memory.c > >> @@ -82,6 +82,7 @@ unsigned long huge_zero_pfn __read_mostly =3D ~0UL; > >> unsigned long huge_anon_orders_always __read_mostly; > >> unsigned long huge_anon_orders_madvise __read_mostly; > >> unsigned long huge_anon_orders_inherit __read_mostly; > >> +static bool anon_orders_configured; > >> > >> unsigned long __thp_vma_allowable_orders(struct vm_area_struct *vma, > >> unsigned long vm_flags, > >> @@ -672,7 +673,10 @@ static int __init hugepage_init_sysfs(struct kobj= ect **hugepage_kobj) > >> * disable all other sizes. powerpc's PMD_ORDER isn't a compil= e-time > >> * constant so we have to do this here. > >> */ > >> - huge_anon_orders_inherit =3D BIT(PMD_ORDER); > >> + if (!anon_orders_configured) { > >> + huge_anon_orders_inherit =3D BIT(PMD_ORDER); > >> + anon_orders_configured =3D true; > >> + } > > > > If a user configures 64KB and doesn't adjust anything for PMD_ORDER, > > then PMD_ORDER will be set to "never", correct? This seems to change > > the default behavior of PMD_ORDER. Could we instead achieve this by > > checking if PMD_ORDER has been explicitly configured? > > Yes, that's how it's implemented in this patch, and the accompanying docs= also > state: > > If ``thp_anon=3D`` is specified at least once, any anon THP sizes > not explicitly configured on the command line are implicitly set to > ``never``. > > My initial approach did exactly as you suggest. But in the original serie= s, I > also had a similar patch to configure file thp with "thp_file=3D". And fo= r file, > all of the orders default to `always`. So if taking the same approach wit= h that > control, the user would have to explicitly opt-out of all supported order= s > rather than just opt-in to the orders they want. And I thought that could= get > tricky in future if support is added for more orders. I felt that was > potentially very confusing so decided it was clearer to have the above ru= le and > make both controls consistent. > > What do you think? If this is the intention, once the user sets the command line, they should realize that the default settings have been overridden. I am perfectly fine with this strategy. with the below cmdline: thp_anon=3D64K:always thp_anon=3D8K:inherit thp_anon=3D32K:madvise thp_anon=3D1M:inherit thp_anon=3D2M:always I am getting: / # cat /sys/kernel/mm/transparent_hugepage/hugepages-64kB/enabled [always] inherit madvise never / # cat /sys/kernel/mm/transparent_hugepage/hugepages-32kB/enabled always inherit [madvise] never / # cat /sys/kernel/mm/transparent_hugepage/hugepages-1024kB/enabled always [inherit] madvise never / # cat /sys/kernel/mm/transparent_hugepage/hugepages-2048kB/enabled [always] inherit madvise never Thus, Tested-by: Barry Song > > > > > >> > >> *hugepage_kobj =3D kobject_create_and_add("transparent_hugepag= e", mm_kobj); > >> if (unlikely(!*hugepage_kobj)) { > >> @@ -857,6 +861,55 @@ static int __init setup_transparent_hugepage(char= *str) > >> } > >> __setup("transparent_hugepage=3D", setup_transparent_hugepage); > >> > >> +static int __init setup_thp_anon(char *str) > >> +{ > >> + unsigned long size; > >> + char *state; > >> + int order; > >> + int ret =3D 0; > >> + > >> + if (!str) > >> + goto out; > >> + > >> + size =3D (unsigned long)memparse(str, &state); > >> + order =3D ilog2(size >> PAGE_SHIFT); > >> + if (*state !=3D ':' || !is_power_of_2(size) || size <=3D PAGE_= SIZE || > >> + !(BIT(order) & THP_ORDERS_ALL_ANON)) > >> + goto out; > >> + > >> + state++; > >> + > >> + if (!strcmp(state, "always")) { > >> + clear_bit(order, &huge_anon_orders_inherit); > >> + clear_bit(order, &huge_anon_orders_madvise); > >> + set_bit(order, &huge_anon_orders_always); > >> + ret =3D 1; > >> + } else if (!strcmp(state, "inherit")) { > >> + clear_bit(order, &huge_anon_orders_always); > >> + clear_bit(order, &huge_anon_orders_madvise); > >> + set_bit(order, &huge_anon_orders_inherit); > >> + ret =3D 1; > >> + } else if (!strcmp(state, "madvise")) { > >> + clear_bit(order, &huge_anon_orders_always); > >> + clear_bit(order, &huge_anon_orders_inherit); > >> + set_bit(order, &huge_anon_orders_madvise); > >> + ret =3D 1; > >> + } else if (!strcmp(state, "never")) { > >> + clear_bit(order, &huge_anon_orders_always); > >> + clear_bit(order, &huge_anon_orders_inherit); > >> + clear_bit(order, &huge_anon_orders_madvise); > >> + ret =3D 1; > >> + } > >> + > >> + if (ret) > >> + anon_orders_configured =3D true; > > > > I mean: > > > > if (ret && order =3D=3D PMD_ORDER) > > anon_pmd_order_configured =3D true; > > > >> +out: > >> + if (!ret) > >> + pr_warn("thp_anon=3D%s: cannot parse, ignored\n", str)= ; > >> + return ret; > >> +} > >> +__setup("thp_anon=3D", setup_thp_anon); > >> + > >> pmd_t maybe_pmd_mkwrite(pmd_t pmd, struct vm_area_struct *vma) > >> { > >> if (likely(vma->vm_flags & VM_WRITE)) > >> -- > >> 2.43.0 > >> > > > > Thanks > > Barry >