* [PATCH v4] mm: Override mTHP "enabled" defaults at kernel cmdline
@ 2024-08-14 2:02 Barry Song
2024-08-14 7:53 ` Baolin Wang
` (2 more replies)
0 siblings, 3 replies; 10+ messages in thread
From: Barry Song @ 2024-08-14 2:02 UTC (permalink / raw)
To: akpm, linux-mm
Cc: baohua, baolin.wang, corbet, david, ioworker0, linux-kernel,
ryan.roberts, v-songbaohua
From: Ryan Roberts <ryan.roberts@arm.com>
Add thp_anon= cmdline parameter to allow specifying the default
enablement of each supported anon THP size. The parameter accepts the
following format and can be provided multiple times to configure each
size:
thp_anon=<size>,<size>[KMG]:<value>;<size>-<size>[KMG]:<value>
An example:
thp_anon=16K-64K:always;128K,512K:inherit;256K:madvise;1M-2M:never
See Documentation/admin-guide/mm/transhuge.rst for more details.
Configuring the defaults at boot time is useful to allow early user
space to take advantage of mTHP before its been configured through
sysfs.
Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
Co-developed-by: Barry Song <v-songbaohua@oppo.com>
Signed-off-by: Barry Song <v-songbaohua@oppo.com>
---
-v4:
* use bitmap APIs to set and clear bits. thanks very much for
David's comment!
.../admin-guide/kernel-parameters.txt | 9 ++
Documentation/admin-guide/mm/transhuge.rst | 37 +++++--
mm/huge_memory.c | 96 ++++++++++++++++++-
3 files changed, 134 insertions(+), 8 deletions(-)
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index f0057bac20fb..d0d141d50638 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -6629,6 +6629,15 @@
<deci-seconds>: poll all this frequency
0: no polling (default)
+ thp_anon= [KNL]
+ Format: <size>,<size>[KMG]:<state>;<size>-<size>[KMG]:<state>
+ state is one of "always", "madvise", "never" or "inherit".
+ Can be used to control the default behavior of the
+ system with respect to anonymous transparent hugepages.
+ Can be used multiple times for multiple anon THP sizes.
+ See Documentation/admin-guide/mm/transhuge.rst for more
+ details.
+
threadirqs [KNL,EARLY]
Force threading of all interrupt handlers except those
marked explicitly IRQF_NO_THREAD.
diff --git a/Documentation/admin-guide/mm/transhuge.rst b/Documentation/admin-guide/mm/transhuge.rst
index 7072469de8a8..528e1a19d63f 100644
--- a/Documentation/admin-guide/mm/transhuge.rst
+++ b/Documentation/admin-guide/mm/transhuge.rst
@@ -284,13 +284,36 @@ that THP is shared. Exceeding the number would block the collapse::
A higher value may increase memory footprint for some workloads.
-Boot parameter
-==============
-
-You can change the sysfs boot time defaults of Transparent Hugepage
-Support by passing the parameter ``transparent_hugepage=always`` or
-``transparent_hugepage=madvise`` or ``transparent_hugepage=never``
-to the kernel command line.
+Boot parameters
+===============
+
+You can change the sysfs boot time default for the top-level "enabled"
+control by passing the parameter ``transparent_hugepage=always`` or
+``transparent_hugepage=madvise`` or ``transparent_hugepage=never`` to the
+kernel command line.
+
+Alternatively, each supported anonymous THP size can be controlled by
+passing ``thp_anon=<size>,<size>[KMG]:<state>;<size>-<size>[KMG]:<state>``,
+where ``<size>`` is the THP size and ``<state>`` is one of ``always``,
+``madvise``, ``never`` or ``inherit``.
+
+For example, the following will set 16K, 32K, 64K THP to ``always``,
+set 128K, 512K to ``inherit``, set 256K to ``madvise`` and 1M, 2M
+to ``never``::
+
+ thp_anon=16K-64K:always;128K,512K:inherit;256K:madvise;1M-2M:never
+
+``thp_anon=`` may be specified multiple times to configure all THP sizes as
+required. If ``thp_anon=`` is specified at least once, any anon THP sizes
+not explicitly configured on the command line are implicitly set to
+``never``.
+
+``transparent_hugepage`` setting only affects the global toggle. If
+``thp_anon`` is not specified, PMD_ORDER THP will default to ``inherit``.
+However, if a valid ``thp_anon`` setting is provided by the user, the
+PMD_ORDER THP policy will be overridden. If the policy for PMD_ORDER
+is not defined within a valid ``thp_anon``, its policy will default to
+``never``.
Hugepages in tmpfs/shmem
========================
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 1a12c011e2df..c5f4e97b49de 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -81,6 +81,7 @@ unsigned long huge_zero_pfn __read_mostly = ~0UL;
unsigned long huge_anon_orders_always __read_mostly;
unsigned long huge_anon_orders_madvise __read_mostly;
unsigned long huge_anon_orders_inherit __read_mostly;
+static bool anon_orders_configured;
unsigned long __thp_vma_allowable_orders(struct vm_area_struct *vma,
unsigned long vm_flags,
@@ -737,7 +738,10 @@ static int __init hugepage_init_sysfs(struct kobject **hugepage_kobj)
* disable all other sizes. powerpc's PMD_ORDER isn't a compile-time
* constant so we have to do this here.
*/
- huge_anon_orders_inherit = BIT(PMD_ORDER);
+ if (!anon_orders_configured) {
+ huge_anon_orders_inherit = BIT(PMD_ORDER);
+ anon_orders_configured = true;
+ }
*hugepage_kobj = kobject_create_and_add("transparent_hugepage", mm_kobj);
if (unlikely(!*hugepage_kobj)) {
@@ -922,6 +926,96 @@ static int __init setup_transparent_hugepage(char *str)
}
__setup("transparent_hugepage=", setup_transparent_hugepage);
+static inline int get_order_from_str(const char *size_str)
+{
+ unsigned long size;
+ char *endptr;
+ int order;
+
+ size = memparse(size_str, &endptr);
+ order = fls(size >> PAGE_SHIFT) - 1;
+ if ((1 << order) & ~THP_ORDERS_ALL_ANON) {
+ pr_err("invalid size %s(order %d) in thp_anon boot parameter\n",
+ size_str, order);
+ return -EINVAL;
+ }
+
+ return order;
+}
+
+static char str_dup[PAGE_SIZE] __meminitdata;
+static int __init setup_thp_anon(char *str)
+{
+ char *token, *range, *policy, *subtoken;
+ unsigned long always, inherit, madvise;
+ char *start_size, *end_size;
+ int start, end;
+ char *p;
+
+ if (!str || strlen(str) + 1 > PAGE_SIZE)
+ goto err;
+ strcpy(str_dup, str);
+
+ always = huge_anon_orders_always;
+ madvise = huge_anon_orders_madvise;
+ inherit = huge_anon_orders_inherit;
+ p = str_dup;
+ while ((token = strsep(&p, ";")) != NULL) {
+ range = strsep(&token, ":");
+ policy = token;
+
+ if (!policy)
+ goto err;
+
+ while ((subtoken = strsep(&range, ",")) != NULL) {
+ if (strchr(subtoken, '-')) {
+ start_size = strsep(&subtoken, "-");
+ end_size = subtoken;
+
+ start = get_order_from_str(start_size);
+ end = get_order_from_str(end_size);
+ } else {
+ start = end = get_order_from_str(subtoken);
+ }
+
+ if (start < 0 || end < 0 || start > end)
+ goto err;
+
+ if (!strcmp(policy, "always")) {
+ bitmap_set(&always, start, end - start + 1);
+ bitmap_clear(&inherit, start, end - start + 1);
+ bitmap_clear(&madvise, start, end - start + 1);
+ } else if (!strcmp(policy, "madvise")) {
+ bitmap_set(&madvise, start, end - start + 1);
+ bitmap_clear(&inherit, start, end - start + 1);
+ bitmap_clear(&always, start, end - start + 1);
+ } else if (!strcmp(policy, "inherit")) {
+ bitmap_set(&inherit, start, end - start + 1);
+ bitmap_clear(&madvise, start, end - start + 1);
+ bitmap_clear(&always, start, end - start + 1);
+ } else if (!strcmp(policy, "never")) {
+ bitmap_clear(&inherit, start, end - start + 1);
+ bitmap_clear(&madvise, start, end - start + 1);
+ bitmap_clear(&always, start, end - start + 1);
+ } else {
+ pr_err("invalid policy %s in thp_anon boot parameter\n", policy);
+ goto err;
+ }
+ }
+ }
+
+ huge_anon_orders_always = always;
+ huge_anon_orders_madvise = madvise;
+ huge_anon_orders_inherit = inherit;
+ anon_orders_configured = true;
+ return 1;
+
+err:
+ pr_warn("thp_anon=%s: cannot parse, ignored\n", str);
+ return 0;
+}
+__setup("thp_anon=", setup_thp_anon);
+
pmd_t maybe_pmd_mkwrite(pmd_t pmd, struct vm_area_struct *vma)
{
if (likely(vma->vm_flags & VM_WRITE))
--
2.34.1
^ permalink raw reply [flat|nested] 10+ messages in thread* Re: [PATCH v4] mm: Override mTHP "enabled" defaults at kernel cmdline
2024-08-14 2:02 [PATCH v4] mm: Override mTHP "enabled" defaults at kernel cmdline Barry Song
@ 2024-08-14 7:53 ` Baolin Wang
2024-08-14 8:09 ` Barry Song
2024-08-14 8:18 ` David Hildenbrand
2024-08-14 22:46 ` Barry Song
2 siblings, 1 reply; 10+ messages in thread
From: Baolin Wang @ 2024-08-14 7:53 UTC (permalink / raw)
To: Barry Song, akpm, linux-mm
Cc: baohua, corbet, david, ioworker0, linux-kernel, ryan.roberts,
v-songbaohua
On 2024/8/14 10:02, Barry Song wrote:
> From: Ryan Roberts <ryan.roberts@arm.com>
>
> Add thp_anon= cmdline parameter to allow specifying the default
> enablement of each supported anon THP size. The parameter accepts the
> following format and can be provided multiple times to configure each
> size:
>
> thp_anon=<size>,<size>[KMG]:<value>;<size>-<size>[KMG]:<value>
>
> An example:
>
> thp_anon=16K-64K:always;128K,512K:inherit;256K:madvise;1M-2M:never
>
> See Documentation/admin-guide/mm/transhuge.rst for more details.
>
> Configuring the defaults at boot time is useful to allow early user
> space to take advantage of mTHP before its been configured through
> sysfs.
>
> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
> Co-developed-by: Barry Song <v-songbaohua@oppo.com>
> Signed-off-by: Barry Song <v-songbaohua@oppo.com>
LGTM. Feel free to add:
Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Tested-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Just a small nit as below.
> ---
> -v4:
> * use bitmap APIs to set and clear bits. thanks very much for
> David's comment!
>
> .../admin-guide/kernel-parameters.txt | 9 ++
> Documentation/admin-guide/mm/transhuge.rst | 37 +++++--
> mm/huge_memory.c | 96 ++++++++++++++++++-
> 3 files changed, 134 insertions(+), 8 deletions(-)
>
> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> index f0057bac20fb..d0d141d50638 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -6629,6 +6629,15 @@
> <deci-seconds>: poll all this frequency
> 0: no polling (default)
>
> + thp_anon= [KNL]
> + Format: <size>,<size>[KMG]:<state>;<size>-<size>[KMG]:<state>
> + state is one of "always", "madvise", "never" or "inherit".
> + Can be used to control the default behavior of the
> + system with respect to anonymous transparent hugepages.
> + Can be used multiple times for multiple anon THP sizes.
> + See Documentation/admin-guide/mm/transhuge.rst for more
> + details.
> +
> threadirqs [KNL,EARLY]
> Force threading of all interrupt handlers except those
> marked explicitly IRQF_NO_THREAD.
> diff --git a/Documentation/admin-guide/mm/transhuge.rst b/Documentation/admin-guide/mm/transhuge.rst
> index 7072469de8a8..528e1a19d63f 100644
> --- a/Documentation/admin-guide/mm/transhuge.rst
> +++ b/Documentation/admin-guide/mm/transhuge.rst
> @@ -284,13 +284,36 @@ that THP is shared. Exceeding the number would block the collapse::
>
> A higher value may increase memory footprint for some workloads.
>
> -Boot parameter
> -==============
> -
> -You can change the sysfs boot time defaults of Transparent Hugepage
> -Support by passing the parameter ``transparent_hugepage=always`` or
> -``transparent_hugepage=madvise`` or ``transparent_hugepage=never``
> -to the kernel command line.
> +Boot parameters
> +===============
> +
> +You can change the sysfs boot time default for the top-level "enabled"
> +control by passing the parameter ``transparent_hugepage=always`` or
> +``transparent_hugepage=madvise`` or ``transparent_hugepage=never`` to the
> +kernel command line.
> +
> +Alternatively, each supported anonymous THP size can be controlled by
> +passing ``thp_anon=<size>,<size>[KMG]:<state>;<size>-<size>[KMG]:<state>``,
> +where ``<size>`` is the THP size and ``<state>`` is one of ``always``,
> +``madvise``, ``never`` or ``inherit``.
> +
> +For example, the following will set 16K, 32K, 64K THP to ``always``,
> +set 128K, 512K to ``inherit``, set 256K to ``madvise`` and 1M, 2M
> +to ``never``::
> +
> + thp_anon=16K-64K:always;128K,512K:inherit;256K:madvise;1M-2M:never
> +
> +``thp_anon=`` may be specified multiple times to configure all THP sizes as
> +required. If ``thp_anon=`` is specified at least once, any anon THP sizes
> +not explicitly configured on the command line are implicitly set to
> +``never``.
> +
> +``transparent_hugepage`` setting only affects the global toggle. If
> +``thp_anon`` is not specified, PMD_ORDER THP will default to ``inherit``.
> +However, if a valid ``thp_anon`` setting is provided by the user, the
> +PMD_ORDER THP policy will be overridden. If the policy for PMD_ORDER
> +is not defined within a valid ``thp_anon``, its policy will default to
> +``never``.
>
> Hugepages in tmpfs/shmem
> ========================
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 1a12c011e2df..c5f4e97b49de 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -81,6 +81,7 @@ unsigned long huge_zero_pfn __read_mostly = ~0UL;
> unsigned long huge_anon_orders_always __read_mostly;
> unsigned long huge_anon_orders_madvise __read_mostly;
> unsigned long huge_anon_orders_inherit __read_mostly;
> +static bool anon_orders_configured;
>
> unsigned long __thp_vma_allowable_orders(struct vm_area_struct *vma,
> unsigned long vm_flags,
> @@ -737,7 +738,10 @@ static int __init hugepage_init_sysfs(struct kobject **hugepage_kobj)
> * disable all other sizes. powerpc's PMD_ORDER isn't a compile-time
> * constant so we have to do this here.
> */
> - huge_anon_orders_inherit = BIT(PMD_ORDER);
> + if (!anon_orders_configured) {
> + huge_anon_orders_inherit = BIT(PMD_ORDER);
> + anon_orders_configured = true;
> + }
>
> *hugepage_kobj = kobject_create_and_add("transparent_hugepage", mm_kobj);
> if (unlikely(!*hugepage_kobj)) {
> @@ -922,6 +926,96 @@ static int __init setup_transparent_hugepage(char *str)
> }
> __setup("transparent_hugepage=", setup_transparent_hugepage);
>
> +static inline int get_order_from_str(const char *size_str)
> +{
> + unsigned long size;
> + char *endptr;
> + int order;
> +
> + size = memparse(size_str, &endptr);
> + order = fls(size >> PAGE_SHIFT) - 1;
Nit: using get_order() seems more robust?
> + if ((1 << order) & ~THP_ORDERS_ALL_ANON) {
> + pr_err("invalid size %s(order %d) in thp_anon boot parameter\n",
> + size_str, order);
> + return -EINVAL;
> + }
> +
> + return order;
> +}
[snip]
^ permalink raw reply [flat|nested] 10+ messages in thread* Re: [PATCH v4] mm: Override mTHP "enabled" defaults at kernel cmdline
2024-08-14 7:53 ` Baolin Wang
@ 2024-08-14 8:09 ` Barry Song
0 siblings, 0 replies; 10+ messages in thread
From: Barry Song @ 2024-08-14 8:09 UTC (permalink / raw)
To: baolin.wang
Cc: akpm, baohua, corbet, david, ioworker0, linux-kernel, linux-mm,
ryan.roberts, v-songbaohua
On Wed, Aug 14, 2024 at 7:53 PM Baolin Wang <baolin.wang@linux.alibaba.com> wrote:
>
>
>
> On 2024/8/14 10:02, Barry Song wrote:
> > From: Ryan Roberts <ryan.roberts@arm.com>
> >
> > Add thp_anon= cmdline parameter to allow specifying the default
> > enablement of each supported anon THP size. The parameter accepts the
> > following format and can be provided multiple times to configure each
> > size:
> >
> > thp_anon=<size>,<size>[KMG]:<value>;<size>-<size>[KMG]:<value>
> >
> > An example:
> >
> > thp_anon=16K-64K:always;128K,512K:inherit;256K:madvise;1M-2M:never
> >
> > See Documentation/admin-guide/mm/transhuge.rst for more details.
> >
> > Configuring the defaults at boot time is useful to allow early user
> > space to take advantage of mTHP before its been configured through
> > sysfs.
> >
> > Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
> > Co-developed-by: Barry Song <v-songbaohua@oppo.com>
> > Signed-off-by: Barry Song <v-songbaohua@oppo.com>
>
> LGTM. Feel free to add:
> Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
> Tested-by: Baolin Wang <baolin.wang@linux.alibaba.com>
>
Thanks, Baolin!
> Just a small nit as below.
>
> > ---
> > -v4:
> > * use bitmap APIs to set and clear bits. thanks very much for
> > David's comment!
> >
> > .../admin-guide/kernel-parameters.txt | 9 ++
> > Documentation/admin-guide/mm/transhuge.rst | 37 +++++--
> > mm/huge_memory.c | 96 ++++++++++++++++++-
> > 3 files changed, 134 insertions(+), 8 deletions(-)
> >
> > diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> > index f0057bac20fb..d0d141d50638 100644
> > --- a/Documentation/admin-guide/kernel-parameters.txt
> > +++ b/Documentation/admin-guide/kernel-parameters.txt
> > @@ -6629,6 +6629,15 @@
> > <deci-seconds>: poll all this frequency
> > 0: no polling (default)
> >
> > + thp_anon= [KNL]
> > + Format: <size>,<size>[KMG]:<state>;<size>-<size>[KMG]:<state>
> > + state is one of "always", "madvise", "never" or "inherit".
> > + Can be used to control the default behavior of the
> > + system with respect to anonymous transparent hugepages.
> > + Can be used multiple times for multiple anon THP sizes.
> > + See Documentation/admin-guide/mm/transhuge.rst for more
> > + details.
> > +
> > threadirqs [KNL,EARLY]
> > Force threading of all interrupt handlers except those
> > marked explicitly IRQF_NO_THREAD.
> > diff --git a/Documentation/admin-guide/mm/transhuge.rst b/Documentation/admin-guide/mm/transhuge.rst
> > index 7072469de8a8..528e1a19d63f 100644
> > --- a/Documentation/admin-guide/mm/transhuge.rst
> > +++ b/Documentation/admin-guide/mm/transhuge.rst
> > @@ -284,13 +284,36 @@ that THP is shared. Exceeding the number would block the collapse::
> >
> > A higher value may increase memory footprint for some workloads.
> >
> > -Boot parameter
> > -==============
> > -
> > -You can change the sysfs boot time defaults of Transparent Hugepage
> > -Support by passing the parameter ``transparent_hugepage=always`` or
> > -``transparent_hugepage=madvise`` or ``transparent_hugepage=never``
> > -to the kernel command line.
> > +Boot parameters
> > +===============
> > +
> > +You can change the sysfs boot time default for the top-level "enabled"
> > +control by passing the parameter ``transparent_hugepage=always`` or
> > +``transparent_hugepage=madvise`` or ``transparent_hugepage=never`` to the
> > +kernel command line.
> > +
> > +Alternatively, each supported anonymous THP size can be controlled by
> > +passing ``thp_anon=<size>,<size>[KMG]:<state>;<size>-<size>[KMG]:<state>``,
> > +where ``<size>`` is the THP size and ``<state>`` is one of ``always``,
> > +``madvise``, ``never`` or ``inherit``.
> > +
> > +For example, the following will set 16K, 32K, 64K THP to ``always``,
> > +set 128K, 512K to ``inherit``, set 256K to ``madvise`` and 1M, 2M
> > +to ``never``::
> > +
> > + thp_anon=16K-64K:always;128K,512K:inherit;256K:madvise;1M-2M:never
> > +
> > +``thp_anon=`` may be specified multiple times to configure all THP sizes as
> > +required. If ``thp_anon=`` is specified at least once, any anon THP sizes
> > +not explicitly configured on the command line are implicitly set to
> > +``never``.
> > +
> > +``transparent_hugepage`` setting only affects the global toggle. If
> > +``thp_anon`` is not specified, PMD_ORDER THP will default to ``inherit``.
> > +However, if a valid ``thp_anon`` setting is provided by the user, the
> > +PMD_ORDER THP policy will be overridden. If the policy for PMD_ORDER
> > +is not defined within a valid ``thp_anon``, its policy will default to
> > +``never``.
> >
> > Hugepages in tmpfs/shmem
> > ========================
> > diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> > index 1a12c011e2df..c5f4e97b49de 100644
> > --- a/mm/huge_memory.c
> > +++ b/mm/huge_memory.c
> > @@ -81,6 +81,7 @@ unsigned long huge_zero_pfn __read_mostly = ~0UL;
> > unsigned long huge_anon_orders_always __read_mostly;
> > unsigned long huge_anon_orders_madvise __read_mostly;
> > unsigned long huge_anon_orders_inherit __read_mostly;
> > +static bool anon_orders_configured;
> >
> > unsigned long __thp_vma_allowable_orders(struct vm_area_struct *vma,
> > unsigned long vm_flags,
> > @@ -737,7 +738,10 @@ static int __init hugepage_init_sysfs(struct kobject **hugepage_kobj)
> > * disable all other sizes. powerpc's PMD_ORDER isn't a compile-time
> > * constant so we have to do this here.
> > */
> > - huge_anon_orders_inherit = BIT(PMD_ORDER);
> > + if (!anon_orders_configured) {
> > + huge_anon_orders_inherit = BIT(PMD_ORDER);
> > + anon_orders_configured = true;
> > + }
> >
> > *hugepage_kobj = kobject_create_and_add("transparent_hugepage", mm_kobj);
> > if (unlikely(!*hugepage_kobj)) {
> > @@ -922,6 +926,96 @@ static int __init setup_transparent_hugepage(char *str)
> > }
> > __setup("transparent_hugepage=", setup_transparent_hugepage);
> >
> > +static inline int get_order_from_str(const char *size_str)
> > +{
> > + unsigned long size;
> > + char *endptr;
> > + int order;
> > +
> > + size = memparse(size_str, &endptr);
> > + order = fls(size >> PAGE_SHIFT) - 1;
>
> Nit: using get_order() seems more robust?
Yes. I agree get_order() is better:
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index c5f4e97b49de..0f398d0dbaad 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -933,7 +933,7 @@ static inline int get_order_from_str(const char *size_str)
int order;
size = memparse(size_str, &endptr);
- order = fls(size >> PAGE_SHIFT) - 1;
+ order = get_order(size);
if ((1 << order) & ~THP_ORDERS_ALL_ANON) {
pr_err("invalid size %s(order %d) in thp_anon boot parameter\n",
size_str, order);
>
> > + if ((1 << order) & ~THP_ORDERS_ALL_ANON) {
> > + pr_err("invalid size %s(order %d) in thp_anon boot parameter\n",
> > + size_str, order);
> > + return -EINVAL;
> > + }
> > +
> > + return order;
> > +}
> [snip]
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v4] mm: Override mTHP "enabled" defaults at kernel cmdline
2024-08-14 2:02 [PATCH v4] mm: Override mTHP "enabled" defaults at kernel cmdline Barry Song
2024-08-14 7:53 ` Baolin Wang
@ 2024-08-14 8:18 ` David Hildenbrand
2024-08-14 8:54 ` Barry Song
2024-08-14 22:46 ` Barry Song
2 siblings, 1 reply; 10+ messages in thread
From: David Hildenbrand @ 2024-08-14 8:18 UTC (permalink / raw)
To: Barry Song, akpm, linux-mm
Cc: baohua, baolin.wang, corbet, ioworker0, linux-kernel,
ryan.roberts, v-songbaohua
On 14.08.24 04:02, Barry Song wrote:
> From: Ryan Roberts <ryan.roberts@arm.com>
>
> Add thp_anon= cmdline parameter to allow specifying the default
> enablement of each supported anon THP size. The parameter accepts the
> following format and can be provided multiple times to configure each
> size:
>
> thp_anon=<size>,<size>[KMG]:<value>;<size>-<size>[KMG]:<value>
>
> An example:
>
> thp_anon=16K-64K:always;128K,512K:inherit;256K:madvise;1M-2M:never
>
> See Documentation/admin-guide/mm/transhuge.rst for more details.
>
> Configuring the defaults at boot time is useful to allow early user
> space to take advantage of mTHP before its been configured through
> sysfs.
>
> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
> Co-developed-by: Barry Song <v-songbaohua@oppo.com>
> Signed-off-by: Barry Song <v-songbaohua@oppo.com>
> ---
> -v4:
> * use bitmap APIs to set and clear bits. thanks very much for
> David's comment!
>
> .../admin-guide/kernel-parameters.txt | 9 ++
> Documentation/admin-guide/mm/transhuge.rst | 37 +++++--
> mm/huge_memory.c | 96 ++++++++++++++++++-
> 3 files changed, 134 insertions(+), 8 deletions(-)
>
> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> index f0057bac20fb..d0d141d50638 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -6629,6 +6629,15 @@
> <deci-seconds>: poll all this frequency
> 0: no polling (default)
>
> + thp_anon= [KNL]
> + Format: <size>,<size>[KMG]:<state>;<size>-<size>[KMG]:<state>
> + state is one of "always", "madvise", "never" or "inherit".
> + Can be used to control the default behavior of the
> + system with respect to anonymous transparent hugepages.
> + Can be used multiple times for multiple anon THP sizes.
> + See Documentation/admin-guide/mm/transhuge.rst for more
> + details.
> +
> threadirqs [KNL,EARLY]
> Force threading of all interrupt handlers except those
> marked explicitly IRQF_NO_THREAD.
> diff --git a/Documentation/admin-guide/mm/transhuge.rst b/Documentation/admin-guide/mm/transhuge.rst
> index 7072469de8a8..528e1a19d63f 100644
> --- a/Documentation/admin-guide/mm/transhuge.rst
> +++ b/Documentation/admin-guide/mm/transhuge.rst
> @@ -284,13 +284,36 @@ that THP is shared. Exceeding the number would block the collapse::
>
> A higher value may increase memory footprint for some workloads.
>
> -Boot parameter
> -==============
> -
> -You can change the sysfs boot time defaults of Transparent Hugepage
> -Support by passing the parameter ``transparent_hugepage=always`` or
> -``transparent_hugepage=madvise`` or ``transparent_hugepage=never``
> -to the kernel command line.
> +Boot parameters
> +===============
> +
> +You can change the sysfs boot time default for the top-level "enabled"
> +control by passing the parameter ``transparent_hugepage=always`` or
> +``transparent_hugepage=madvise`` or ``transparent_hugepage=never`` to the
> +kernel command line.
> +
> +Alternatively, each supported anonymous THP size can be controlled by
> +passing ``thp_anon=<size>,<size>[KMG]:<state>;<size>-<size>[KMG]:<state>``,
> +where ``<size>`` is the THP size and ``<state>`` is one of ``always``,
> +``madvise``, ``never`` or ``inherit``.
> +
> +For example, the following will set 16K, 32K, 64K THP to ``always``,
> +set 128K, 512K to ``inherit``, set 256K to ``madvise`` and 1M, 2M
> +to ``never``::
> +
> + thp_anon=16K-64K:always;128K,512K:inherit;256K:madvise;1M-2M:never
> +
> +``thp_anon=`` may be specified multiple times to configure all THP sizes as
> +required. If ``thp_anon=`` is specified at least once, any anon THP sizes
> +not explicitly configured on the command line are implicitly set to
> +``never``.
> +
> +``transparent_hugepage`` setting only affects the global toggle. If
> +``thp_anon`` is not specified, PMD_ORDER THP will default to ``inherit``.
> +However, if a valid ``thp_anon`` setting is provided by the user, the
> +PMD_ORDER THP policy will be overridden. If the policy for PMD_ORDER
> +is not defined within a valid ``thp_anon``, its policy will default to
> +``never``.
>
> Hugepages in tmpfs/shmem
> ========================
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 1a12c011e2df..c5f4e97b49de 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -81,6 +81,7 @@ unsigned long huge_zero_pfn __read_mostly = ~0UL;
> unsigned long huge_anon_orders_always __read_mostly;
> unsigned long huge_anon_orders_madvise __read_mostly;
> unsigned long huge_anon_orders_inherit __read_mostly;
> +static bool anon_orders_configured;
>
> unsigned long __thp_vma_allowable_orders(struct vm_area_struct *vma,
> unsigned long vm_flags,
> @@ -737,7 +738,10 @@ static int __init hugepage_init_sysfs(struct kobject **hugepage_kobj)
> * disable all other sizes. powerpc's PMD_ORDER isn't a compile-time
> * constant so we have to do this here.
> */
> - huge_anon_orders_inherit = BIT(PMD_ORDER);
> + if (!anon_orders_configured) {
> + huge_anon_orders_inherit = BIT(PMD_ORDER);
> + anon_orders_configured = true;
> + }
>
> *hugepage_kobj = kobject_create_and_add("transparent_hugepage", mm_kobj);
> if (unlikely(!*hugepage_kobj)) {
> @@ -922,6 +926,96 @@ static int __init setup_transparent_hugepage(char *str)
> }
> __setup("transparent_hugepage=", setup_transparent_hugepage);
>
> +static inline int get_order_from_str(const char *size_str)
> +{
> + unsigned long size;
> + char *endptr;
> + int order;
> +
> + size = memparse(size_str, &endptr);
Do we have to also test if is_power_of_2(), and refuse if not? For
example, what if someone would pass 3K, would the existing check catch it?
> + order = fls(size >> PAGE_SHIFT) - 1;
Is this a fancy way of writing
order = log2(size >> PAGE_SHIFT);
? :)
Anyhow, if get_order() wraps that, all good.
> + if ((1 << order) & ~THP_ORDERS_ALL_ANON) {
> + pr_err("invalid size %s(order %d) in thp_anon boot parameter\n",
> + size_str, order);
> + return -EINVAL;
> + }
> +
> + return order;
> +}
Apart from that, nothing jumped at me.
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 10+ messages in thread* Re: [PATCH v4] mm: Override mTHP "enabled" defaults at kernel cmdline
2024-08-14 8:18 ` David Hildenbrand
@ 2024-08-14 8:54 ` Barry Song
2024-08-15 10:26 ` David Hildenbrand
0 siblings, 1 reply; 10+ messages in thread
From: Barry Song @ 2024-08-14 8:54 UTC (permalink / raw)
To: david
Cc: akpm, baohua, baolin.wang, corbet, ioworker0, linux-kernel,
linux-mm, ryan.roberts, v-songbaohua
On Wed, Aug 14, 2024 at 8:18 PM David Hildenbrand <david@redhat.com> wrote:
>
> On 14.08.24 04:02, Barry Song wrote:
> > From: Ryan Roberts <ryan.roberts@arm.com>
> >
> > Add thp_anon= cmdline parameter to allow specifying the default
> > enablement of each supported anon THP size. The parameter accepts the
> > following format and can be provided multiple times to configure each
> > size:
> >
> > thp_anon=<size>,<size>[KMG]:<value>;<size>-<size>[KMG]:<value>
> >
> > An example:
> >
> > thp_anon=16K-64K:always;128K,512K:inherit;256K:madvise;1M-2M:never
> >
> > See Documentation/admin-guide/mm/transhuge.rst for more details.
> >
> > Configuring the defaults at boot time is useful to allow early user
> > space to take advantage of mTHP before its been configured through
> > sysfs.
> >
> > Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
> > Co-developed-by: Barry Song <v-songbaohua@oppo.com>
> > Signed-off-by: Barry Song <v-songbaohua@oppo.com>
> > ---
> > -v4:
> > * use bitmap APIs to set and clear bits. thanks very much for
> > David's comment!
> >
> > .../admin-guide/kernel-parameters.txt | 9 ++
> > Documentation/admin-guide/mm/transhuge.rst | 37 +++++--
> > mm/huge_memory.c | 96 ++++++++++++++++++-
> > 3 files changed, 134 insertions(+), 8 deletions(-)
> >
> > diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> > index f0057bac20fb..d0d141d50638 100644
> > --- a/Documentation/admin-guide/kernel-parameters.txt
> > +++ b/Documentation/admin-guide/kernel-parameters.txt
> > @@ -6629,6 +6629,15 @@
> > <deci-seconds>: poll all this frequency
> > 0: no polling (default)
> >
> > + thp_anon= [KNL]
> > + Format: <size>,<size>[KMG]:<state>;<size>-<size>[KMG]:<state>
> > + state is one of "always", "madvise", "never" or "inherit".
> > + Can be used to control the default behavior of the
> > + system with respect to anonymous transparent hugepages.
> > + Can be used multiple times for multiple anon THP sizes.
> > + See Documentation/admin-guide/mm/transhuge.rst for more
> > + details.
> > +
> > threadirqs [KNL,EARLY]
> > Force threading of all interrupt handlers except those
> > marked explicitly IRQF_NO_THREAD.
> > diff --git a/Documentation/admin-guide/mm/transhuge.rst b/Documentation/admin-guide/mm/transhuge.rst
> > index 7072469de8a8..528e1a19d63f 100644
> > --- a/Documentation/admin-guide/mm/transhuge.rst
> > +++ b/Documentation/admin-guide/mm/transhuge.rst
> > @@ -284,13 +284,36 @@ that THP is shared. Exceeding the number would block the collapse::
> >
> > A higher value may increase memory footprint for some workloads.
> >
> > -Boot parameter
> > -==============
> > -
> > -You can change the sysfs boot time defaults of Transparent Hugepage
> > -Support by passing the parameter ``transparent_hugepage=always`` or
> > -``transparent_hugepage=madvise`` or ``transparent_hugepage=never``
> > -to the kernel command line.
> > +Boot parameters
> > +===============
> > +
> > +You can change the sysfs boot time default for the top-level "enabled"
> > +control by passing the parameter ``transparent_hugepage=always`` or
> > +``transparent_hugepage=madvise`` or ``transparent_hugepage=never`` to the
> > +kernel command line.
> > +
> > +Alternatively, each supported anonymous THP size can be controlled by
> > +passing ``thp_anon=<size>,<size>[KMG]:<state>;<size>-<size>[KMG]:<state>``,
> > +where ``<size>`` is the THP size and ``<state>`` is one of ``always``,
> > +``madvise``, ``never`` or ``inherit``.
> > +
> > +For example, the following will set 16K, 32K, 64K THP to ``always``,
> > +set 128K, 512K to ``inherit``, set 256K to ``madvise`` and 1M, 2M
> > +to ``never``::
> > +
> > + thp_anon=16K-64K:always;128K,512K:inherit;256K:madvise;1M-2M:never
> > +
> > +``thp_anon=`` may be specified multiple times to configure all THP sizes as
> > +required. If ``thp_anon=`` is specified at least once, any anon THP sizes
> > +not explicitly configured on the command line are implicitly set to
> > +``never``.
> > +
> > +``transparent_hugepage`` setting only affects the global toggle. If
> > +``thp_anon`` is not specified, PMD_ORDER THP will default to ``inherit``.
> > +However, if a valid ``thp_anon`` setting is provided by the user, the
> > +PMD_ORDER THP policy will be overridden. If the policy for PMD_ORDER
> > +is not defined within a valid ``thp_anon``, its policy will default to
> > +``never``.
> >
> > Hugepages in tmpfs/shmem
> > ========================
> > diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> > index 1a12c011e2df..c5f4e97b49de 100644
> > --- a/mm/huge_memory.c
> > +++ b/mm/huge_memory.c
> > @@ -81,6 +81,7 @@ unsigned long huge_zero_pfn __read_mostly = ~0UL;
> > unsigned long huge_anon_orders_always __read_mostly;
> > unsigned long huge_anon_orders_madvise __read_mostly;
> > unsigned long huge_anon_orders_inherit __read_mostly;
> > +static bool anon_orders_configured;
> >
> > unsigned long __thp_vma_allowable_orders(struct vm_area_struct *vma,
> > unsigned long vm_flags,
> > @@ -737,7 +738,10 @@ static int __init hugepage_init_sysfs(struct kobject **hugepage_kobj)
> > * disable all other sizes. powerpc's PMD_ORDER isn't a compile-time
> > * constant so we have to do this here.
> > */
> > - huge_anon_orders_inherit = BIT(PMD_ORDER);
> > + if (!anon_orders_configured) {
> > + huge_anon_orders_inherit = BIT(PMD_ORDER);
> > + anon_orders_configured = true;
> > + }
> >
> > *hugepage_kobj = kobject_create_and_add("transparent_hugepage", mm_kobj);
> > if (unlikely(!*hugepage_kobj)) {
> > @@ -922,6 +926,96 @@ static int __init setup_transparent_hugepage(char *str)
> > }
> > __setup("transparent_hugepage=", setup_transparent_hugepage);
> >
> > +static inline int get_order_from_str(const char *size_str)
> > +{
> > + unsigned long size;
> > + char *endptr;
> > + int order;
> > +
> > + size = memparse(size_str, &endptr);
>
> Do we have to also test if is_power_of_2(), and refuse if not? For
> example, what if someone would pass 3K, would the existing check catch it?
no, the existing check can't catch it.
I passed thp_anon=15K-64K:always, then I got 16K enabled:
/ # cat /sys/kernel/mm/transparent_hugepage/hugepages-16kB/enabled
[always] inherit madvise never
I can actually check that by:
static inline int get_order_from_str(const char *size_str)
{
unsigned long size;
char *endptr;
int order;
size = memparse(size_str, &endptr);
if (!is_power_of_2(size >> PAGE_SHIFT))
goto err;
order = get_order(size);
if ((1 << order) & ~THP_ORDERS_ALL_ANON)
goto err;
return order;
err:
pr_err("invalid size %s in thp_anon boot parameter\n", size_str);
return -EINVAL;
}
>
> > + order = fls(size >> PAGE_SHIFT) - 1;
>
> Is this a fancy way of writing
>
> order = log2(size >> PAGE_SHIFT);
>
> ? :)
I think ilog2 is implemented by fls ?
>
> Anyhow, if get_order() wraps that, all good.
I guess it doesn't check power of 2?
>
> > + if ((1 << order) & ~THP_ORDERS_ALL_ANON) {
> > + pr_err("invalid size %s(order %d) in thp_anon boot parameter\n",
> > + size_str, order);
> > + return -EINVAL;
> > + }
> > +
> > + return order;
> > +}
>
> Apart from that, nothing jumped at me.
Please take a look at the new get_order_from_str() before I
send v5 :-)
>
> --
> Cheers,
>
> David / dhildenb
>
Thanks
Barry
^ permalink raw reply [flat|nested] 10+ messages in thread* Re: [PATCH v4] mm: Override mTHP "enabled" defaults at kernel cmdline
2024-08-14 8:54 ` Barry Song
@ 2024-08-15 10:26 ` David Hildenbrand
2024-08-15 23:50 ` Barry Song
0 siblings, 1 reply; 10+ messages in thread
From: David Hildenbrand @ 2024-08-15 10:26 UTC (permalink / raw)
To: Barry Song
Cc: akpm, baohua, baolin.wang, corbet, ioworker0, linux-kernel,
linux-mm, ryan.roberts, v-songbaohua
>>> +static inline int get_order_from_str(const char *size_str)
>>> +{
>>> + unsigned long size;
>>> + char *endptr;
>>> + int order;
>>> +
>>> + size = memparse(size_str, &endptr);
>>
>> Do we have to also test if is_power_of_2(), and refuse if not? For
>> example, what if someone would pass 3K, would the existing check catch it?
>
> no, the existing check can't catch it.
>
> I passed thp_anon=15K-64K:always, then I got 16K enabled:
>
> / # cat /sys/kernel/mm/transparent_hugepage/hugepages-16kB/enabled
> [always] inherit madvise never
>
Okay, so we should document then that start/end of the range must be
valid THP sizes.
> I can actually check that by:
>
> static inline int get_order_from_str(const char *size_str)
> {
> unsigned long size;
> char *endptr;
> int order;
>
> size = memparse(size_str, &endptr);
>
> if (!is_power_of_2(size >> PAGE_SHIFT))
No need for the shift.
if (!is_power_of_2(size))
Is likely even more correct if someone would manage to pass something
stupid like
16385 (16K + 1)
> goto err;
> order = get_order(size);
> if ((1 << order) & ~THP_ORDERS_ALL_ANON)
> goto err;
>
> return order;
> err:
> pr_err("invalid size %s in thp_anon boot parameter\n", size_str);
> return -EINVAL;
> }
>
>>
>>> + order = fls(size >> PAGE_SHIFT) - 1;
>>
>> Is this a fancy way of writing
>>
>> order = log2(size >> PAGE_SHIFT);
>>
>> ? :)
>
> I think ilog2 is implemented by fls ?
Yes, so we should have used that instead. But get_order()
is even better.
>
>>
>> Anyhow, if get_order() wraps that, all good.
>
> I guess it doesn't check power of 2?
>
>>
>>> + if ((1 << order) & ~THP_ORDERS_ALL_ANON) {
>>> + pr_err("invalid size %s(order %d) in thp_anon boot parameter\n",
>>> + size_str, order);
>>> + return -EINVAL;
>>> + }
>>> +
>>> + return order;
>>> +}
>>
>> Apart from that, nothing jumped at me.
>
> Please take a look at the new get_order_from_str() before I
> send v5 :-)
Besides the shift for is_power_of_2(), LGTM, thanks!
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 10+ messages in thread* Re: [PATCH v4] mm: Override mTHP "enabled" defaults at kernel cmdline
2024-08-15 10:26 ` David Hildenbrand
@ 2024-08-15 23:50 ` Barry Song
2024-08-16 9:33 ` David Hildenbrand
0 siblings, 1 reply; 10+ messages in thread
From: Barry Song @ 2024-08-15 23:50 UTC (permalink / raw)
To: david
Cc: akpm, baohua, baolin.wang, corbet, ioworker0, linux-kernel,
linux-mm, ryan.roberts, v-songbaohua
On Thu, Aug 15, 2024 at 10:26 PM David Hildenbrand <david@redhat.com> wrote:
>
> >>> +static inline int get_order_from_str(const char *size_str)
> >>> +{
> >>> + unsigned long size;
> >>> + char *endptr;
> >>> + int order;
> >>> +
> >>> + size = memparse(size_str, &endptr);
> >>
> >> Do we have to also test if is_power_of_2(), and refuse if not? For
> >> example, what if someone would pass 3K, would the existing check catch it?
> >
> > no, the existing check can't catch it.
> >
> > I passed thp_anon=15K-64K:always, then I got 16K enabled:
> >
> > / # cat /sys/kernel/mm/transparent_hugepage/hugepages-16kB/enabled
> > [always] inherit madvise never
> >
>
> Okay, so we should document then that start/end of the range must be
> valid THP sizes.
Ack
>
> > I can actually check that by:
> >
> > static inline int get_order_from_str(const char *size_str)
> > {
> > unsigned long size;
> > char *endptr;
> > int order;
> >
> > size = memparse(size_str, &endptr);
> >
> > if (!is_power_of_2(size >> PAGE_SHIFT))
>
> No need for the shift.
>
> if (!is_power_of_2(size))
>
> Is likely even more correct if someone would manage to pass something
> stupid like
>
> 16385 (16K + 1)
Ack
>
> > goto err;
> > order = get_order(size);
> > if ((1 << order) & ~THP_ORDERS_ALL_ANON)
> > goto err;
> >
> > return order;
> > err:
> > pr_err("invalid size %s in thp_anon boot parameter\n", size_str);
> > return -EINVAL;
> > }
> >
> >>
> >>> + order = fls(size >> PAGE_SHIFT) - 1;
> >>
> >> Is this a fancy way of writing
> >>
> >> order = log2(size >> PAGE_SHIFT);
> >>
> >> ? :)
> >
> > I think ilog2 is implemented by fls ?
>
> Yes, so we should have used that instead. But get_order()
> is even better.
>
> >
> >>
> >> Anyhow, if get_order() wraps that, all good.
> >
> > I guess it doesn't check power of 2?
> >
> >>
> >>> + if ((1 << order) & ~THP_ORDERS_ALL_ANON) {
> >>> + pr_err("invalid size %s(order %d) in thp_anon boot parameter\n",
> >>> + size_str, order);
> >>> + return -EINVAL;
> >>> + }
> >>> +
> >>> + return order;
> >>> +}
> >>
> >> Apart from that, nothing jumped at me.
> >
> > Please take a look at the new get_order_from_str() before I
> > send v5 :-)
>
> Besides the shift for is_power_of_2(), LGTM, thanks!
Thanks, David!
Hi Andrew,
Apologies for sending another squash request. If you'd
prefer me to send a new v5 that includes all the changes,
please let me know.
Don't shift the size, as it can still detect invalid sizes
like 16K+1. Also, document that the size must be a valid THP
size.
diff --git a/Documentation/admin-guide/mm/transhuge.rst b/Documentation/admin-guide/mm/transhuge.rst
index 15404f06eefd..4468851b6ecb 100644
--- a/Documentation/admin-guide/mm/transhuge.rst
+++ b/Documentation/admin-guide/mm/transhuge.rst
@@ -294,8 +294,9 @@ kernel command line.
Alternatively, each supported anonymous THP size can be controlled by
passing ``thp_anon=<size>,<size>[KMG]:<state>;<size>-<size>[KMG]:<state>``,
-where ``<size>`` is the THP size and ``<state>`` is one of ``always``,
-``madvise``, ``never`` or ``inherit``.
+where ``<size>`` is the THP size (must be a power of 2 of PAGE_SIZE and
+supported anonymous THP) and ``<state>`` is one of ``always``, ``madvise``,
+``never`` or ``inherit``.
For example, the following will set 16K, 32K, 64K THP to ``always``,
set 128K, 512K to ``inherit``, set 256K to ``madvise`` and 1M, 2M
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index d6dade8ac5f6..903b47f2b2db 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -953,7 +953,7 @@ static inline int get_order_from_str(const char *size_str)
size = memparse(size_str, &endptr);
- if (!is_power_of_2(size >> PAGE_SHIFT))
+ if (!is_power_of_2(size))
goto err;
order = get_order(size);
if ((1 << order) & ~THP_ORDERS_ALL_ANON)
>
> --
> Cheers,
>
> David / dhildenb
>
Thanks
Barry
^ permalink raw reply [flat|nested] 10+ messages in thread* Re: [PATCH v4] mm: Override mTHP "enabled" defaults at kernel cmdline
2024-08-15 23:50 ` Barry Song
@ 2024-08-16 9:33 ` David Hildenbrand
2024-08-16 9:47 ` Barry Song
0 siblings, 1 reply; 10+ messages in thread
From: David Hildenbrand @ 2024-08-16 9:33 UTC (permalink / raw)
To: Barry Song
Cc: akpm, baohua, baolin.wang, corbet, ioworker0, linux-kernel,
linux-mm, ryan.roberts, v-songbaohua
On 16.08.24 01:50, Barry Song wrote:
> On Thu, Aug 15, 2024 at 10:26 PM David Hildenbrand <david@redhat.com> wrote:
>>
>>>>> +static inline int get_order_from_str(const char *size_str)
>>>>> +{
>>>>> + unsigned long size;
>>>>> + char *endptr;
>>>>> + int order;
>>>>> +
>>>>> + size = memparse(size_str, &endptr);
>>>>
>>>> Do we have to also test if is_power_of_2(), and refuse if not? For
>>>> example, what if someone would pass 3K, would the existing check catch it?
>>>
>>> no, the existing check can't catch it.
>>>
>>> I passed thp_anon=15K-64K:always, then I got 16K enabled:
>>>
>>> / # cat /sys/kernel/mm/transparent_hugepage/hugepages-16kB/enabled
>>> [always] inherit madvise never
>>>
>>
>> Okay, so we should document then that start/end of the range must be
>> valid THP sizes.
>
> Ack
>
>>
>>> I can actually check that by:
>>>
>>> static inline int get_order_from_str(const char *size_str)
>>> {
>>> unsigned long size;
>>> char *endptr;
>>> int order;
>>>
>>> size = memparse(size_str, &endptr);
>>>
>>> if (!is_power_of_2(size >> PAGE_SHIFT))
>>
>> No need for the shift.
>>
>> if (!is_power_of_2(size))
>>
>> Is likely even more correct if someone would manage to pass something
>> stupid like
>>
>> 16385 (16K + 1)
>
> Ack
>
>>
>>> goto err;
>>> order = get_order(size);
>>> if ((1 << order) & ~THP_ORDERS_ALL_ANON)
>>> goto err;
>>>
>>> return order;
>>> err:
>>> pr_err("invalid size %s in thp_anon boot parameter\n", size_str);
>>> return -EINVAL;
>>> }
>>>
>>>>
>>>>> + order = fls(size >> PAGE_SHIFT) - 1;
>>>>
>>>> Is this a fancy way of writing
>>>>
>>>> order = log2(size >> PAGE_SHIFT);
>>>>
>>>> ? :)
>>>
>>> I think ilog2 is implemented by fls ?
>>
>> Yes, so we should have used that instead. But get_order()
>> is even better.
>>
>>>
>>>>
>>>> Anyhow, if get_order() wraps that, all good.
>>>
>>> I guess it doesn't check power of 2?
>>>
>>>>
>>>>> + if ((1 << order) & ~THP_ORDERS_ALL_ANON) {
>>>>> + pr_err("invalid size %s(order %d) in thp_anon boot parameter\n",
>>>>> + size_str, order);
>>>>> + return -EINVAL;
>>>>> + }
>>>>> +
>>>>> + return order;
>>>>> +}
>>>>
>>>> Apart from that, nothing jumped at me.
>>>
>>> Please take a look at the new get_order_from_str() before I
>>> send v5 :-)
>>
>> Besides the shift for is_power_of_2(), LGTM, thanks!
>
> Thanks, David!
>
> Hi Andrew,
>
> Apologies for sending another squash request. If you'd
> prefer me to send a new v5 that includes all the changes,
> please let me know.
>
>
> Don't shift the size, as it can still detect invalid sizes
> like 16K+1. Also, document that the size must be a valid THP
> size.
>
> diff --git a/Documentation/admin-guide/mm/transhuge.rst b/Documentation/admin-guide/mm/transhuge.rst
> index 15404f06eefd..4468851b6ecb 100644
> --- a/Documentation/admin-guide/mm/transhuge.rst
> +++ b/Documentation/admin-guide/mm/transhuge.rst
> @@ -294,8 +294,9 @@ kernel command line.
>
> Alternatively, each supported anonymous THP size can be controlled by
> passing ``thp_anon=<size>,<size>[KMG]:<state>;<size>-<size>[KMG]:<state>``,
> -where ``<size>`` is the THP size and ``<state>`` is one of ``always``,
> -``madvise``, ``never`` or ``inherit``.
> +where ``<size>`` is the THP size (must be a power of 2 of PAGE_SIZE and
> +supported anonymous THP) and ``<state>`` is one of ``always``, ``madvise``,
> +``never`` or ``inherit``.
>
> For example, the following will set 16K, 32K, 64K THP to ``always``,
> set 128K, 512K to ``inherit``, set 256K to ``madvise`` and 1M, 2M
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index d6dade8ac5f6..903b47f2b2db 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -953,7 +953,7 @@ static inline int get_order_from_str(const char *size_str)
>
> size = memparse(size_str, &endptr);
>
> - if (!is_power_of_2(size >> PAGE_SHIFT))
> + if (!is_power_of_2(size))
> goto err;
Reading your documentation above, do we also want to test "if (size <
PAGE_SIZE)", or is that implicitly covered? (likely not I assume?)
I assume it's implicitly covered: if we pass "1k" , it would be mapped
to "4k" (order-0) and that is not a valid mTHP size, right?
I would appreciate a quick v5, just so can see the final result more
easily :)
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 10+ messages in thread* Re: [PATCH v4] mm: Override mTHP "enabled" defaults at kernel cmdline
2024-08-16 9:33 ` David Hildenbrand
@ 2024-08-16 9:47 ` Barry Song
0 siblings, 0 replies; 10+ messages in thread
From: Barry Song @ 2024-08-16 9:47 UTC (permalink / raw)
To: David Hildenbrand
Cc: akpm, baolin.wang, corbet, ioworker0, linux-kernel, linux-mm,
ryan.roberts, v-songbaohua
On Fri, Aug 16, 2024 at 9:33 PM David Hildenbrand <david@redhat.com> wrote:
>
> On 16.08.24 01:50, Barry Song wrote:
> > On Thu, Aug 15, 2024 at 10:26 PM David Hildenbrand <david@redhat.com> wrote:
> >>
> >>>>> +static inline int get_order_from_str(const char *size_str)
> >>>>> +{
> >>>>> + unsigned long size;
> >>>>> + char *endptr;
> >>>>> + int order;
> >>>>> +
> >>>>> + size = memparse(size_str, &endptr);
> >>>>
> >>>> Do we have to also test if is_power_of_2(), and refuse if not? For
> >>>> example, what if someone would pass 3K, would the existing check catch it?
> >>>
> >>> no, the existing check can't catch it.
> >>>
> >>> I passed thp_anon=15K-64K:always, then I got 16K enabled:
> >>>
> >>> / # cat /sys/kernel/mm/transparent_hugepage/hugepages-16kB/enabled
> >>> [always] inherit madvise never
> >>>
> >>
> >> Okay, so we should document then that start/end of the range must be
> >> valid THP sizes.
> >
> > Ack
> >
> >>
> >>> I can actually check that by:
> >>>
> >>> static inline int get_order_from_str(const char *size_str)
> >>> {
> >>> unsigned long size;
> >>> char *endptr;
> >>> int order;
> >>>
> >>> size = memparse(size_str, &endptr);
> >>>
> >>> if (!is_power_of_2(size >> PAGE_SHIFT))
> >>
> >> No need for the shift.
> >>
> >> if (!is_power_of_2(size))
> >>
> >> Is likely even more correct if someone would manage to pass something
> >> stupid like
> >>
> >> 16385 (16K + 1)
> >
> > Ack
> >
> >>
> >>> goto err;
> >>> order = get_order(size);
> >>> if ((1 << order) & ~THP_ORDERS_ALL_ANON)
> >>> goto err;
> >>>
> >>> return order;
> >>> err:
> >>> pr_err("invalid size %s in thp_anon boot parameter\n", size_str);
> >>> return -EINVAL;
> >>> }
> >>>
> >>>>
> >>>>> + order = fls(size >> PAGE_SHIFT) - 1;
> >>>>
> >>>> Is this a fancy way of writing
> >>>>
> >>>> order = log2(size >> PAGE_SHIFT);
> >>>>
> >>>> ? :)
> >>>
> >>> I think ilog2 is implemented by fls ?
> >>
> >> Yes, so we should have used that instead. But get_order()
> >> is even better.
> >>
> >>>
> >>>>
> >>>> Anyhow, if get_order() wraps that, all good.
> >>>
> >>> I guess it doesn't check power of 2?
> >>>
> >>>>
> >>>>> + if ((1 << order) & ~THP_ORDERS_ALL_ANON) {
> >>>>> + pr_err("invalid size %s(order %d) in thp_anon boot parameter\n",
> >>>>> + size_str, order);
> >>>>> + return -EINVAL;
> >>>>> + }
> >>>>> +
> >>>>> + return order;
> >>>>> +}
> >>>>
> >>>> Apart from that, nothing jumped at me.
> >>>
> >>> Please take a look at the new get_order_from_str() before I
> >>> send v5 :-)
> >>
> >> Besides the shift for is_power_of_2(), LGTM, thanks!
> >
> > Thanks, David!
> >
> > Hi Andrew,
> >
> > Apologies for sending another squash request. If you'd
> > prefer me to send a new v5 that includes all the changes,
> > please let me know.
> >
> >
> > Don't shift the size, as it can still detect invalid sizes
> > like 16K+1. Also, document that the size must be a valid THP
> > size.
> >
> > diff --git a/Documentation/admin-guide/mm/transhuge.rst b/Documentation/admin-guide/mm/transhuge.rst
> > index 15404f06eefd..4468851b6ecb 100644
> > --- a/Documentation/admin-guide/mm/transhuge.rst
> > +++ b/Documentation/admin-guide/mm/transhuge.rst
> > @@ -294,8 +294,9 @@ kernel command line.
> >
> > Alternatively, each supported anonymous THP size can be controlled by
> > passing ``thp_anon=<size>,<size>[KMG]:<state>;<size>-<size>[KMG]:<state>``,
> > -where ``<size>`` is the THP size and ``<state>`` is one of ``always``,
> > -``madvise``, ``never`` or ``inherit``.
> > +where ``<size>`` is the THP size (must be a power of 2 of PAGE_SIZE and
> > +supported anonymous THP) and ``<state>`` is one of ``always``, ``madvise``,
> > +``never`` or ``inherit``.
> >
> > For example, the following will set 16K, 32K, 64K THP to ``always``,
> > set 128K, 512K to ``inherit``, set 256K to ``madvise`` and 1M, 2M
> > diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> > index d6dade8ac5f6..903b47f2b2db 100644
> > --- a/mm/huge_memory.c
> > +++ b/mm/huge_memory.c
> > @@ -953,7 +953,7 @@ static inline int get_order_from_str(const char *size_str)
> >
> > size = memparse(size_str, &endptr);
> >
> > - if (!is_power_of_2(size >> PAGE_SHIFT))
> > + if (!is_power_of_2(size))
> > goto err;
>
>
> Reading your documentation above, do we also want to test "if (size <
> PAGE_SIZE)", or is that implicitly covered? (likely not I assume?)
as we also check the order is valid. so size <PAGE_SIZE will get invalid
order.
static inline int get_order_from_str(const char *size_str)
{
unsigned long size;
char *endptr;
int order;
size = memparse(size_str, &endptr);
if (!is_power_of_2(size >> PAGE_SHIFT))
goto err;
order = get_order(size);
if ((1 << order) & ~THP_ORDERS_ALL_ANON)
goto err;
return order;
err:
pr_err("invalid size %s in thp_anon boot parameter\n", size_str);
return -EINVAL;
}
>
> I assume it's implicitly covered: if we pass "1k" , it would be mapped
> to "4k" (order-0) and that is not a valid mTHP size, right?
>
> I would appreciate a quick v5, just so can see the final result more
> easily :)
sure.
>
> --
> Cheers,
>
> David / dhildenb
>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v4] mm: Override mTHP "enabled" defaults at kernel cmdline
2024-08-14 2:02 [PATCH v4] mm: Override mTHP "enabled" defaults at kernel cmdline Barry Song
2024-08-14 7:53 ` Baolin Wang
2024-08-14 8:18 ` David Hildenbrand
@ 2024-08-14 22:46 ` Barry Song
2 siblings, 0 replies; 10+ messages in thread
From: Barry Song @ 2024-08-14 22:46 UTC (permalink / raw)
To: akpm
Cc: baohua, baolin.wang, corbet, david, ioworker0, linux-kernel,
linux-mm, ryan.roberts, v-songbaohua
On Wed, Aug 14, 2024 at 2:03 PM Barry Song <21cnbao@gmail.com> wrote:
>
> From: Ryan Roberts <ryan.roberts@arm.com>
>
> Add thp_anon= cmdline parameter to allow specifying the default
> enablement of each supported anon THP size. The parameter accepts the
> following format and can be provided multiple times to configure each
> size:
>
> thp_anon=<size>,<size>[KMG]:<value>;<size>-<size>[KMG]:<value>
>
> An example:
>
> thp_anon=16K-64K:always;128K,512K:inherit;256K:madvise;1M-2M:never
>
> See Documentation/admin-guide/mm/transhuge.rst for more details.
>
> Configuring the defaults at boot time is useful to allow early user
> space to take advantage of mTHP before its been configured through
> sysfs.
>
> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
> Co-developed-by: Barry Song <v-songbaohua@oppo.com>
> Signed-off-by: Barry Song <v-songbaohua@oppo.com>
> ---
Hi Andrew,
I saw you have pulled v4. Thanks!
Can you please squash the below changes suggested by Baolin and David?
From af42aa80f45d89798027e44a8711f7737e08b115 Mon Sep 17 00:00:00 2001
From: Barry Song <v-songbaohua@oppo.com>
Date: Thu, 15 Aug 2024 10:34:16 +1200
Subject: [PATCH] mm: use get_oder() and check size is is_power_of_2
Using get_order() is more robust according to Baolin.
It is also better to filter illegal size such as 3KB,
16KB according to David.
Suggested-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Suggested-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Barry Song <v-songbaohua@oppo.com>
---
mm/huge_memory.c | 15 +++++++++------
1 file changed, 9 insertions(+), 6 deletions(-)
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 01beda16aece..d6dade8ac5f6 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -952,14 +952,17 @@ static inline int get_order_from_str(const char *size_str)
int order;
size = memparse(size_str, &endptr);
- order = fls(size >> PAGE_SHIFT) - 1;
- if ((1 << order) & ~THP_ORDERS_ALL_ANON) {
- pr_err("invalid size %s(order %d) in thp_anon boot parameter\n",
- size_str, order);
- return -EINVAL;
- }
+
+ if (!is_power_of_2(size >> PAGE_SHIFT))
+ goto err;
+ order = get_order(size);
+ if ((1 << order) & ~THP_ORDERS_ALL_ANON)
+ goto err;
return order;
+err:
+ pr_err("invalid size %s in thp_anon boot parameter\n", size_str);
+ return -EINVAL;
}
static char str_dup[PAGE_SIZE] __meminitdata;
--
2.34.1
Thanks
Barry
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2024-08-16 9:47 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-08-14 2:02 [PATCH v4] mm: Override mTHP "enabled" defaults at kernel cmdline Barry Song
2024-08-14 7:53 ` Baolin Wang
2024-08-14 8:09 ` Barry Song
2024-08-14 8:18 ` David Hildenbrand
2024-08-14 8:54 ` Barry Song
2024-08-15 10:26 ` David Hildenbrand
2024-08-15 23:50 ` Barry Song
2024-08-16 9:33 ` David Hildenbrand
2024-08-16 9:47 ` Barry Song
2024-08-14 22:46 ` Barry Song
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox