Re: [PATCH] mm/hugetlb: avoid weird message in hugetlb_init

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Nitesh Narayan Lal <nitesh@redhat.com>
To: Mike Kravetz <mike.kravetz@oracle.com>,
	"Longpeng (Mike)" <longpeng2@huawei.com>
Cc: arei.gonglei@huawei.com, huangzhichao@huawei.com,
	Matthew Wilcox <willy@infradead.org>,
	Andrew Morton <akpm@linux-foundation.org>, Qian Cai <cai@lca.pw>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org
Subject: Re: [PATCH] mm/hugetlb: avoid weird message in hugetlb_init
Date: Fri, 10 Apr 2020 11:47:46 -0400	[thread overview]
Message-ID: <641eae15-1ea7-c573-0d64-09dcccc1717d@redhat.com> (raw)
In-Reply-To: <43017337-fe28-16e0-fbdd-d6368bdd2eb2@oracle.com>


[-- Attachment #1.1: Type: text/plain, Size: 4858 bytes --]


On 3/6/20 3:12 PM, Mike Kravetz wrote:
> On 3/5/20 10:36 PM, Longpeng (Mike) wrote:
>> 在 2020/3/6 8:09, Mike Kravetz 写道:
>>> On 3/4/20 7:30 PM, Longpeng(Mike) wrote:
>>>> From: Longpeng <longpeng2@huawei.com>
>>> I am thinking we may want to have a more generic solution by allowing
>>> the default_hugepagesz= processing code to verify the passed size and
>>> set up the corresponding hstate.  This would require more cooperation
>>> between architecture specific and independent code.  This could be
>>> accomplished with a simple arch_hugetlb_valid_size() routine provided
>>> by the architectures.  Below is an untested patch to add such support
>>> to the architecture independent code and x86.  Other architectures would
>>> be similar.
>>>
>>> In addition, with architectures providing arch_hugetlb_valid_size() it
>>> should be possible to have a common routine in architecture independent
>>> code to read/process hugepagesz= command line arguments.
>>>
>> I just want to use the minimize changes to address this issue, so I choosed a
>> way which my patch did.
>>
>> To be honest, the approach you suggested above is much better though it need
>> more changes.
>>
>>> Of course, another approach would be to simply require ALL architectures
>>> to set up hstates for ALL supported huge page sizes.
>>>
>> I think this is also needed, then we can request all supported size of hugepages
>> by sysfs(e.g. /sys/kernel/mm/hugepages/*) dynamically. Currently, (x86) we can
>> only request 1G-hugepage through sysfs if we boot with 'default_hugepagesz=1G',
>> even with the first approach.
> I 'think' you can use sysfs for 1G huge pages on x86 today.  Just booted a
> system without any hugepage options on the command line.
>
> # cat /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages
> 0
> # cat /sys/kernel/mm/hugepages/hugepages-1048576kB/^Cugepages
> # echo 1 > /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages
> # cat /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages
> 1
> # cat /sys/kernel/mm/hugepages/hugepages-1048576kB/free_hugepages
> 1
>
> x86 and riscv will set up hstates for PUD_SIZE hstates by default if
> CONFIG_CONTIG_ALLOC.  This is because of a somewhat recent feature that
> allowed dynamic allocation of gigantic (page order >= MAX_ORDER) pages.
> Before that feature, it made no sense to set up an hstate for gigantic
> pages if they were not allocated at boot time and could not be dynamically
> added later.
>
> I'll code up a proposal that does the following:
> - Have arch specific code provide a list of supported huge page sizes
> - Arch independent code uses list to create all hstates
> - Move processing of "hugepagesz=" to arch independent code
> - Validate "default_hugepagesz=" when value is read from command line
>
> It make take a few days.  When ready, I will pull in the architecture
> specific people.

Hi Mike,

On platforms that support multiple huge page sizes when 'hugepagesz' is not
specified before 'hugepages=', hugepages are not allocated. (For example
if we are requesting 1GB hugepages)

In terms of reporting meminfo and /sys/kernel/../nr_hugepages reports the
expected results but if we use sysctl vm.nr_hugepages then it reports a non-zero
value as it reads the max_huge_pages from the default hstate instead of
nr_huge_pages.
AFAIK nr_huge_pages is the one that indicates the number of huge pages that are
successfully allocated.

Does vm.nr_hugepages is expected to report the maximum number of hugepages? If
so, will it not make sense to rename the procname?

However, if we expect nr_hugepages to report the number of successfully
allocated hugepages then we should use nr_huge_pages in
hugetlb_sysctl_handler_common().


>
>> BTW, because it's not easy to discuss with you due to the time difference, I
>> have another question about the default hugepages to consult you here. Why the
>> /proc/meminfo only show the info about the default hugepages, but not others?
>> meminfo is more well know than sysfs, some ordinary users know meminfo but don't
>> know use the sysfs to get the hugepages status(e.g. total, free).
> I believe that is simply history.  In the beginning there was only the
> default huge page size and that was added to meminfo.  People then wrote
> scripts to parse huge page information in meminfo.  When support for
> other huge pages was added, it was not added to meminfo as it could break
> user scripts parsing the file.  Adding information for all potential
> huge page sizes may create lots of entries that are unused.  I was not
> around when these decisions were made, but that is my understanding.
> BTW - A recently added meminfo field 'Hugetlb' displays the amount of
> memory consumed by huge pages of ALL sizes.
-- 
Nitesh


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

next prev parent reply	other threads:[~2020-04-10 15:47 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-03-05  3:30 Longpeng(Mike)
2020-03-06  0:09 ` Mike Kravetz
2020-03-06  6:36   ` Longpeng (Mike)
2020-03-06 20:12     ` Mike Kravetz
2020-03-09  8:16       ` Longpeng (Mike)
2020-04-10 15:47       ` Nitesh Narayan Lal [this message]
2020-04-13 18:33         ` Mike Kravetz
2020-04-13 21:21           ` Nitesh Narayan Lal
2020-04-15  4:03             ` Mike Kravetz
2020-04-15 11:46               ` Nitesh Narayan Lal

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=641eae15-1ea7-c573-0d64-09dcccc1717d@redhat.com \
    --to=nitesh@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=arei.gonglei@huawei.com \
    --cc=cai@lca.pw \
    --cc=huangzhichao@huawei.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=longpeng2@huawei.com \
    --cc=mike.kravetz@oracle.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox