linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Mike Kravetz <mike.kravetz@oracle.com>
To: David Hildenbrand <david@redhat.com>,
	Peng Liu <liupeng256@huawei.com>,
	akpm@linux-foundation.org, yaozhenguo1@gmail.com,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	stable@vger.kernel.org
Subject: Re: [PATCH v2 1/2] hugetlb: Fix hugepages_setup when deal with pernode
Date: Mon, 4 Apr 2022 16:48:35 -0700	[thread overview]
Message-ID: <d3b98e2b-2148-172a-358c-e7ab1e444c3b@oracle.com> (raw)
In-Reply-To: <e3889061-4681-0618-5291-05b9559e0e10@redhat.com>

On 4/4/22 03:41, David Hildenbrand wrote:
> On 01.04.22 19:23, Mike Kravetz wrote:
>> On 4/1/22 03:43, David Hildenbrand wrote:
>>> On 01.04.22 12:12, Peng Liu wrote:
>>>> Hugepages can be specified to pernode since "hugetlbfs: extend
>>>> the definition of hugepages parameter to support node allocation",
>>>> but the following problem is observed.
>>>>
>>>> Confusing behavior is observed when both 1G and 2M hugepage is set
>>>> after "numa=off".
>>>>  cmdline hugepage settings:
>>>>   hugepagesz=1G hugepages=0:3,1:3
>>>>   hugepagesz=2M hugepages=0:1024,1:1024
>>>>  results:
>>>>   HugeTLB registered 1.00 GiB page size, pre-allocated 0 pages
>>>>   HugeTLB registered 2.00 MiB page size, pre-allocated 1024 pages
>>>>
>>>> Furthermore, confusing behavior can be also observed when invalid
>>>> node behind valid node.
>>>>
>>>> To fix this, hugetlb_hstate_alloc_pages should be called even when
>>>> hugepages_setup going to invalid.
>>>
>>> Shouldn't we bail out if someone requests node-specific allocations but
>>> we are not running with NUMA?
>>
>> I thought about this as well, and could not come up with a good answer.
>> Certainly, nobody SHOULD specify both 'numa=off' and ask for node specific
>> allocations on the same command line.  I would have no problem bailing out
>> in such situations.  But, I think that would also require the hugetlb command
>> line processing to look for such situations.
> 
> Yes. Right now I see
> 
> if (tmp >= nr_online_nodes)
> 	goto invalid;
> 
> Which seems a little strange, because IIUC, it's the number of online
> nodes, which is completely wrong with a sparse online bitmap. Just
> imagine node 0 and node 2 are online, and node 1 is offline. Assuming
> that "node < 2" is valid is wrong.
> 
> Why don't we check for node_online() and bail out if that is not the
> case? Is it too early for that check? But why does comparing against
> nr_online_nodes() work, then?
> 
> 
> Having that said, I'm not sure if all usage of nr_online_nodes in
> mm/hugetlb.c is wrong, with a sparse online bitmap. Outside of that,
> it's really just used for "nr_online_nodes > 1". I might be wrong, though.

I think you are correct.  My bad for not being more thorough in reviewing
the original patch that added this code.  My incorrect assumption was that
a sparse node map was only possible via offline operations which could not
happen this early in boot.  I now see that a sparse map can be presented
by fw/bios/etc.  So, yes I do believe we need to check for online nodes.

-- 
Mike Kravetz

> 
>>
>> One could also argue that if there is only a single node (not numa=off on
>> command line) and someone specifies node local allocations we should bail.
> 
> I assume "numa=off" is always parsed before hugepages_setup() is called,
> right? So we can just rely on the actual numa information.
> 
> 
>>
>> I was 'thinking' about a situation where we had multiple nodes and node
>> local allocations were 'hard coded' via grub or something.  Then, for some
>> reason one node fails to come up on a reboot.  Should we bail on all the
>> hugetlb allocations, or should we try to allocate on the still available
>> nodes?
> 
> Depends on what "bail" means. Printing a warning and stopping to
> allocate further is certainly good enough for my taste :)
> 
>>
>> When I went back and reread the reason for this change, I see that it is
>> primarily for 'some debugging and test cases'.
>>
>>>
>>> What's the result after your change?
>>>
>>>>
>>>> Cc: <stable@vger.kernel.org>
>>>
>>> I am not sure if this is really stable material.
>>
>> Right now, we partially and inconsistently process node specific allocations
>> if there are missing nodes.  We allocate 'regular' hugetlb pages on existing
>> nodes.  But, we do not allocate gigantic hugetlb pages on existing nodes.
>>
>> I believe this is worth fixing in stable.
> 
> I am skeptical.
> 
> https://www.kernel.org/doc/Documentation/process/stable-kernel-rules.rst
> 
> " - It must fix a real bug that bothers people (not a, "This could be a
>    problem..." type thing)."
> 
> While the current behavior is suboptimal, it's certainly not an urgent
> bug (?) and the kernel will boot and work just fine. As you mentioned
> "nobody SHOULD specify both 'numa=off' and ask for node specific
> allocations on the same command line.", this is just a corner case.
> 
> Adjusting it upstream -- okay. Backporting to stable? I don't think so.
> 


  reply	other threads:[~2022-04-04 23:49 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-04-01 10:12 [PATCH v2 0/2] hugetlb: Fix confusing behavior Peng Liu
2022-04-01 10:12 ` [PATCH v2 1/2] hugetlb: Fix hugepages_setup when deal with pernode Peng Liu
2022-04-01 10:43   ` David Hildenbrand
2022-04-01 17:23     ` Mike Kravetz
2022-04-04 10:41       ` David Hildenbrand
2022-04-04 23:48         ` Mike Kravetz [this message]
2022-04-02  2:36     ` liupeng (DM)
2022-04-01 10:12 ` [PATCH v2 2/2] hugetlb: Fix return value of __setup handlers Peng Liu
2022-04-01 10:46   ` David Hildenbrand
2022-04-02  1:33     ` liupeng (DM)
2022-04-04 10:25       ` David Hildenbrand

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=d3b98e2b-2148-172a-358c-e7ab1e444c3b@oracle.com \
    --to=mike.kravetz@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=david@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=liupeng256@huawei.com \
    --cc=stable@vger.kernel.org \
    --cc=yaozhenguo1@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox