From: Nitesh Narayan Lal <nitesh@redhat.com>
To: Mike Kravetz <mike.kravetz@oracle.com>,
"Longpeng (Mike)" <longpeng2@huawei.com>
Cc: arei.gonglei@huawei.com, huangzhichao@huawei.com,
Matthew Wilcox <willy@infradead.org>,
Andrew Morton <akpm@linux-foundation.org>, Qian Cai <cai@lca.pw>,
linux-kernel@vger.kernel.org, linux-mm@kvack.org
Subject: Re: [PATCH] mm/hugetlb: avoid weird message in hugetlb_init
Date: Fri, 10 Apr 2020 11:47:46 -0400 [thread overview]
Message-ID: <641eae15-1ea7-c573-0d64-09dcccc1717d@redhat.com> (raw)
In-Reply-To: <43017337-fe28-16e0-fbdd-d6368bdd2eb2@oracle.com>
[-- Attachment #1.1: Type: text/plain, Size: 4858 bytes --]
On 3/6/20 3:12 PM, Mike Kravetz wrote:
> On 3/5/20 10:36 PM, Longpeng (Mike) wrote:
>> 在 2020/3/6 8:09, Mike Kravetz 写道:
>>> On 3/4/20 7:30 PM, Longpeng(Mike) wrote:
>>>> From: Longpeng <longpeng2@huawei.com>
>>> I am thinking we may want to have a more generic solution by allowing
>>> the default_hugepagesz= processing code to verify the passed size and
>>> set up the corresponding hstate. This would require more cooperation
>>> between architecture specific and independent code. This could be
>>> accomplished with a simple arch_hugetlb_valid_size() routine provided
>>> by the architectures. Below is an untested patch to add such support
>>> to the architecture independent code and x86. Other architectures would
>>> be similar.
>>>
>>> In addition, with architectures providing arch_hugetlb_valid_size() it
>>> should be possible to have a common routine in architecture independent
>>> code to read/process hugepagesz= command line arguments.
>>>
>> I just want to use the minimize changes to address this issue, so I choosed a
>> way which my patch did.
>>
>> To be honest, the approach you suggested above is much better though it need
>> more changes.
>>
>>> Of course, another approach would be to simply require ALL architectures
>>> to set up hstates for ALL supported huge page sizes.
>>>
>> I think this is also needed, then we can request all supported size of hugepages
>> by sysfs(e.g. /sys/kernel/mm/hugepages/*) dynamically. Currently, (x86) we can
>> only request 1G-hugepage through sysfs if we boot with 'default_hugepagesz=1G',
>> even with the first approach.
> I 'think' you can use sysfs for 1G huge pages on x86 today. Just booted a
> system without any hugepage options on the command line.
>
> # cat /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages
> 0
> # cat /sys/kernel/mm/hugepages/hugepages-1048576kB/^Cugepages
> # echo 1 > /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages
> # cat /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages
> 1
> # cat /sys/kernel/mm/hugepages/hugepages-1048576kB/free_hugepages
> 1
>
> x86 and riscv will set up hstates for PUD_SIZE hstates by default if
> CONFIG_CONTIG_ALLOC. This is because of a somewhat recent feature that
> allowed dynamic allocation of gigantic (page order >= MAX_ORDER) pages.
> Before that feature, it made no sense to set up an hstate for gigantic
> pages if they were not allocated at boot time and could not be dynamically
> added later.
>
> I'll code up a proposal that does the following:
> - Have arch specific code provide a list of supported huge page sizes
> - Arch independent code uses list to create all hstates
> - Move processing of "hugepagesz=" to arch independent code
> - Validate "default_hugepagesz=" when value is read from command line
>
> It make take a few days. When ready, I will pull in the architecture
> specific people.
Hi Mike,
On platforms that support multiple huge page sizes when 'hugepagesz' is not
specified before 'hugepages=', hugepages are not allocated. (For example
if we are requesting 1GB hugepages)
In terms of reporting meminfo and /sys/kernel/../nr_hugepages reports the
expected results but if we use sysctl vm.nr_hugepages then it reports a non-zero
value as it reads the max_huge_pages from the default hstate instead of
nr_huge_pages.
AFAIK nr_huge_pages is the one that indicates the number of huge pages that are
successfully allocated.
Does vm.nr_hugepages is expected to report the maximum number of hugepages? If
so, will it not make sense to rename the procname?
However, if we expect nr_hugepages to report the number of successfully
allocated hugepages then we should use nr_huge_pages in
hugetlb_sysctl_handler_common().
>
>> BTW, because it's not easy to discuss with you due to the time difference, I
>> have another question about the default hugepages to consult you here. Why the
>> /proc/meminfo only show the info about the default hugepages, but not others?
>> meminfo is more well know than sysfs, some ordinary users know meminfo but don't
>> know use the sysfs to get the hugepages status(e.g. total, free).
> I believe that is simply history. In the beginning there was only the
> default huge page size and that was added to meminfo. People then wrote
> scripts to parse huge page information in meminfo. When support for
> other huge pages was added, it was not added to meminfo as it could break
> user scripts parsing the file. Adding information for all potential
> huge page sizes may create lots of entries that are unused. I was not
> around when these decisions were made, but that is my understanding.
> BTW - A recently added meminfo field 'Hugetlb' displays the amount of
> memory consumed by huge pages of ALL sizes.
--
Nitesh
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
next prev parent reply other threads:[~2020-04-10 15:47 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-03-05 3:30 Longpeng(Mike)
2020-03-06 0:09 ` Mike Kravetz
2020-03-06 6:36 ` Longpeng (Mike)
2020-03-06 20:12 ` Mike Kravetz
2020-03-09 8:16 ` Longpeng (Mike)
2020-04-10 15:47 ` Nitesh Narayan Lal [this message]
2020-04-13 18:33 ` Mike Kravetz
2020-04-13 21:21 ` Nitesh Narayan Lal
2020-04-15 4:03 ` Mike Kravetz
2020-04-15 11:46 ` Nitesh Narayan Lal
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=641eae15-1ea7-c573-0d64-09dcccc1717d@redhat.com \
--to=nitesh@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=arei.gonglei@huawei.com \
--cc=cai@lca.pw \
--cc=huangzhichao@huawei.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=longpeng2@huawei.com \
--cc=mike.kravetz@oracle.com \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox