From: Barry Song <21cnbao@gmail.com>
To: David Hildenbrand <david@redhat.com>
Cc: akpm@linux-foundation.org, baolin.wang@linux.alibaba.com,
corbet@lwn.net, ioworker0@gmail.com,
linux-kernel@vger.kernel.org, linux-mm@kvack.org,
ryan.roberts@arm.com
Subject: Re: [PATCH v2] mm: Override mTHP "enabled" defaults at kernel cmdline
Date: Mon, 12 Aug 2024 20:36:33 +1200 [thread overview]
Message-ID: <CAGsJ_4ziAchNaqHXmXFLa56t5PxfayDwkrdfR9OzRc35t3PfAg@mail.gmail.com> (raw)
In-Reply-To: <08390990-b47a-4663-8eae-ee51aac55b45@redhat.com>
On Mon, Aug 12, 2024 at 8:20 PM David Hildenbrand <david@redhat.com> wrote:
>
> On 12.08.24 07:36, Barry Song wrote:
> > On Sat, Aug 10, 2024 at 1:15 AM David Hildenbrand <david@redhat.com> wrote:
> >>
> >>>>>>> -You can change the sysfs boot time defaults of Transparent Hugepage
> >>>>>>> -Support by passing the parameter ``transparent_hugepage=always`` or
> >>>>>>> -``transparent_hugepage=madvise`` or ``transparent_hugepage=never``
> >>>>>>> -to the kernel command line.
> >>>>>>> +You can change the sysfs boot time default for the top-level "enabled"
> >>>>>>> +control by passing the parameter ``transparent_hugepage=always`` or
> >>>>>>> +``transparent_hugepage=madvise`` or ``transparent_hugepage=never`` to the
> >>>>>>> +kernel command line.
> >>>>>>> +
> >>>>>>> +Alternatively, each supported anonymous THP size can be controlled by
> >>>>>>> +passing ``thp_anon=<size>[KMG]:<state>``, where ``<size>`` is the THP size
> >>>>>>> +and ``<state>`` is one of ``always``, ``madvise``, ``never`` or
> >>>>>>> +``inherit``.
> >>>>>>> +
> >>>>>>> +For example, the following will set 64K THP to ``always``::
> >>>>>>> +
> >>>>>>> + thp_anon=64K:always
> >>>>>>> +
> >>>>>>> +``thp_anon=`` may be specified multiple times to configure all THP sizes as
> >>>>>>> +required. If ``thp_anon=`` is specified at least once, any anon THP sizes
> >>>>>>> +not explicitly configured on the command line are implicitly set to
> >>>>>>> +``never``.
> >>>>>>
> >>>>>> I suggest documenting that "thp_anon=" will not effect the value of
> >>>>>> "transparent_hugepage=", or any configured default.
> >>>
> >>> Did you see the previous conversation with Barry about whether or not to honour
> >>> configured defaults when any thp_anon= is provided [1]? Sounds like you also
> >>> think we should honour the PMD "inherit" default if not explicitly provided on
> >>> the command line? (see link for justification for the approach I'm currently
> >>> taking).
> >>
> >> I primarily think that we should document it :)
> >>
> >> What if someone passes "transparent_hugepage=always" and "thp_anon=..."?
> >> I would assume that transparent_hugepage would only affect the global
> >> toggle then?
> >>
> >>>
> >>> [1]
> >>> https://lore.kernel.org/linux-mm/CAGsJ_4x8ruPspuk_FQVggJMWcXLbRuZFq44gg-Dt7Ewt3ExqTw@mail.gmail.com/
> >>>
> >>>>>>
> >>>>>> Wondering if a syntax like
> >>>>>>
> >>>>>> thp_anon=16K,32K,64K:always;1048K,2048K:madvise
> >>>
> >>> Are there examples of that syntax already or have you just made it up? I found
> >>> examples with the colon (:) but nothing this fancy. I guess that's not a reason
> >>> not to do it though (other than the risk of screwing up the parser in a subtle way).
> >>
> >> I made it up -- mostly ;) I think we are quite flexible on what we can
> >> do. As always, maybe we can keep it bit consistent with existing stuff.
> >>
> >> For hugetlb_cma we have things like
> >> "<node>:nn[KMGTPE|[,<node>:nn[KMGTPE]]
> >>
> >> "memmap=" options are more ... advanced, including memory ranges. There
> >> are a bunch more documented in kernel-parameters.txt that have more
> >> elaborate formats.
> >>
> >> Ranges would probably be the most valuable addition. So maybe we should
> >> start with:
> >>
> >> thp_anon=16K-64K:always,1048K-2048K:madvise
> >>
> >> So to enable all THPs it would simply be
> >>
> >> thp_anon=16K-2M:always
> >>
> >> Interesting question what would happen if someone passes:
> >>
> >> thp_anon=8K-2M:always
> >>
> >> Likely we simply would apply it to any size in the range, even if
> >> start/end is not a THP size.
> >>
> >> But we would want to complain to the user if someone only specifies a
> >> single one (or a range does not even select a single one) that does not
> >> exist:
> >>
> >> thp_anon=8K:always
> >
> > How about rejecting settings with any illegal size or policy?
> >
> > I found there are too many corner cases to say "yes" or "no". It seems
> > the code could be much cleaner if we simply reject illegal settings.
> > On the other hand, we should rely on smart users to provide correct
> > settings, shouldn’t we? While working one the code, I felt that
> > extracting partial correct settings from incorrect ones and supporting
> > them might be a bit of over-design.
>
> No strong opinion on failing configs. Ignoring non-existing sizes might
> lead to more portable cmdlines between kernel versions.
Well, I realized that the code I posted has actually applied the correct
parts because it modifies the global variable huge_anon_orders_xxx.
Unless I use temporary variables and then copy the results to
global ones, the code has been this issue to some extent.
maybe I should remove the below "goto err", instead, ignore the
incorrect strings and go to the next strings to parse.
static int __init setup_thp_anon(char *str)
{
...
if (!strcmp(policy, "always")) {
set_bits(&huge_anon_orders_always, start, end);
clear_bits(&huge_anon_orders_inherit,
start, end);
clear_bits(&huge_anon_orders_madvise,
start, end);
} else if (!strcmp(policy, "madvise")) {
...
} else if (!strcmp(policy, "inherit")) {
...
} else if (!strcmp(policy, "never")) {
...
} else {
/* goto err; */ -> pr_err
}
}
}
...
err:
pr_warn("thp_anon=%s: cannot parse(invalid size or policy),
ignored\n", str);
return 0;
}
>
> >
> > I have tested the below code by
> >
> > "thp_anon=16K-64K:always;128K,512K:inherit;256K:madvise;1M-2M:never"
>
> There are some parameters that separate individual options by ";"
> already (config_acs). Most parameters use ",". No idea what's better
> here, and which separator to use for sizes instead when using "," for
> options. No strong opinion.
But we have used "," for the purpose "128K,512K:inherit". so here
we have to use ";" for "16K-64K:always;128K,512K:inherit"
>
>
> --
> Cheers,
>
> David / dhildenb
Thanks
Barry
next prev parent reply other threads:[~2024-08-12 8:36 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <20240808101700.571701-1-ryan.roberts@arm.com>
2024-08-08 21:17 ` Barry Song
2024-08-09 7:58 ` Ryan Roberts
2024-08-09 8:31 ` Barry Song
2024-08-09 8:57 ` Ryan Roberts
2024-08-09 9:04 ` Barry Song
2024-08-09 8:49 ` Baolin Wang
2024-08-09 8:52 ` Barry Song
2024-08-09 9:19 ` David Hildenbrand
2024-08-09 9:24 ` Barry Song
2024-08-09 9:32 ` David Hildenbrand
2024-08-09 10:37 ` Ryan Roberts
2024-08-09 11:37 ` Barry Song
2024-08-09 13:15 ` David Hildenbrand
2024-08-12 5:36 ` Barry Song
2024-08-12 8:20 ` David Hildenbrand
2024-08-12 8:36 ` Barry Song [this message]
2024-08-12 9:05 ` David Hildenbrand
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAGsJ_4ziAchNaqHXmXFLa56t5PxfayDwkrdfR9OzRc35t3PfAg@mail.gmail.com \
--to=21cnbao@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=baolin.wang@linux.alibaba.com \
--cc=corbet@lwn.net \
--cc=david@redhat.com \
--cc=ioworker0@gmail.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=ryan.roberts@arm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox