From: Mike Kravetz <mike.kravetz@oracle.com>
To: David Hildenbrand <david@redhat.com>,
Peng Liu <liupeng256@huawei.com>,
akpm@linux-foundation.org, yaozhenguo1@gmail.com,
linux-mm@kvack.org, linux-kernel@vger.kernel.org,
stable@vger.kernel.org
Subject: Re: [PATCH v2 1/2] hugetlb: Fix hugepages_setup when deal with pernode
Date: Mon, 4 Apr 2022 16:48:35 -0700 [thread overview]
Message-ID: <d3b98e2b-2148-172a-358c-e7ab1e444c3b@oracle.com> (raw)
In-Reply-To: <e3889061-4681-0618-5291-05b9559e0e10@redhat.com>
On 4/4/22 03:41, David Hildenbrand wrote:
> On 01.04.22 19:23, Mike Kravetz wrote:
>> On 4/1/22 03:43, David Hildenbrand wrote:
>>> On 01.04.22 12:12, Peng Liu wrote:
>>>> Hugepages can be specified to pernode since "hugetlbfs: extend
>>>> the definition of hugepages parameter to support node allocation",
>>>> but the following problem is observed.
>>>>
>>>> Confusing behavior is observed when both 1G and 2M hugepage is set
>>>> after "numa=off".
>>>> cmdline hugepage settings:
>>>> hugepagesz=1G hugepages=0:3,1:3
>>>> hugepagesz=2M hugepages=0:1024,1:1024
>>>> results:
>>>> HugeTLB registered 1.00 GiB page size, pre-allocated 0 pages
>>>> HugeTLB registered 2.00 MiB page size, pre-allocated 1024 pages
>>>>
>>>> Furthermore, confusing behavior can be also observed when invalid
>>>> node behind valid node.
>>>>
>>>> To fix this, hugetlb_hstate_alloc_pages should be called even when
>>>> hugepages_setup going to invalid.
>>>
>>> Shouldn't we bail out if someone requests node-specific allocations but
>>> we are not running with NUMA?
>>
>> I thought about this as well, and could not come up with a good answer.
>> Certainly, nobody SHOULD specify both 'numa=off' and ask for node specific
>> allocations on the same command line. I would have no problem bailing out
>> in such situations. But, I think that would also require the hugetlb command
>> line processing to look for such situations.
>
> Yes. Right now I see
>
> if (tmp >= nr_online_nodes)
> goto invalid;
>
> Which seems a little strange, because IIUC, it's the number of online
> nodes, which is completely wrong with a sparse online bitmap. Just
> imagine node 0 and node 2 are online, and node 1 is offline. Assuming
> that "node < 2" is valid is wrong.
>
> Why don't we check for node_online() and bail out if that is not the
> case? Is it too early for that check? But why does comparing against
> nr_online_nodes() work, then?
>
>
> Having that said, I'm not sure if all usage of nr_online_nodes in
> mm/hugetlb.c is wrong, with a sparse online bitmap. Outside of that,
> it's really just used for "nr_online_nodes > 1". I might be wrong, though.
I think you are correct. My bad for not being more thorough in reviewing
the original patch that added this code. My incorrect assumption was that
a sparse node map was only possible via offline operations which could not
happen this early in boot. I now see that a sparse map can be presented
by fw/bios/etc. So, yes I do believe we need to check for online nodes.
--
Mike Kravetz
>
>>
>> One could also argue that if there is only a single node (not numa=off on
>> command line) and someone specifies node local allocations we should bail.
>
> I assume "numa=off" is always parsed before hugepages_setup() is called,
> right? So we can just rely on the actual numa information.
>
>
>>
>> I was 'thinking' about a situation where we had multiple nodes and node
>> local allocations were 'hard coded' via grub or something. Then, for some
>> reason one node fails to come up on a reboot. Should we bail on all the
>> hugetlb allocations, or should we try to allocate on the still available
>> nodes?
>
> Depends on what "bail" means. Printing a warning and stopping to
> allocate further is certainly good enough for my taste :)
>
>>
>> When I went back and reread the reason for this change, I see that it is
>> primarily for 'some debugging and test cases'.
>>
>>>
>>> What's the result after your change?
>>>
>>>>
>>>> Cc: <stable@vger.kernel.org>
>>>
>>> I am not sure if this is really stable material.
>>
>> Right now, we partially and inconsistently process node specific allocations
>> if there are missing nodes. We allocate 'regular' hugetlb pages on existing
>> nodes. But, we do not allocate gigantic hugetlb pages on existing nodes.
>>
>> I believe this is worth fixing in stable.
>
> I am skeptical.
>
> https://www.kernel.org/doc/Documentation/process/stable-kernel-rules.rst
>
> " - It must fix a real bug that bothers people (not a, "This could be a
> problem..." type thing)."
>
> While the current behavior is suboptimal, it's certainly not an urgent
> bug (?) and the kernel will boot and work just fine. As you mentioned
> "nobody SHOULD specify both 'numa=off' and ask for node specific
> allocations on the same command line.", this is just a corner case.
>
> Adjusting it upstream -- okay. Backporting to stable? I don't think so.
>
next prev parent reply other threads:[~2022-04-04 23:49 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-04-01 10:12 [PATCH v2 0/2] hugetlb: Fix confusing behavior Peng Liu
2022-04-01 10:12 ` [PATCH v2 1/2] hugetlb: Fix hugepages_setup when deal with pernode Peng Liu
2022-04-01 10:43 ` David Hildenbrand
2022-04-01 17:23 ` Mike Kravetz
2022-04-04 10:41 ` David Hildenbrand
2022-04-04 23:48 ` Mike Kravetz [this message]
2022-04-02 2:36 ` liupeng (DM)
2022-04-01 10:12 ` [PATCH v2 2/2] hugetlb: Fix return value of __setup handlers Peng Liu
2022-04-01 10:46 ` David Hildenbrand
2022-04-02 1:33 ` liupeng (DM)
2022-04-04 10:25 ` David Hildenbrand
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=d3b98e2b-2148-172a-358c-e7ab1e444c3b@oracle.com \
--to=mike.kravetz@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=david@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=liupeng256@huawei.com \
--cc=stable@vger.kernel.org \
--cc=yaozhenguo1@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox