From: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
To: lcapitulino@redhat.com
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
mtosatti@redhat.com, aarcange@redhat.com, mgorman@suse.de,
akpm@linux-foundation.org, andi@firstfloor.org, davidlohr@hp.com,
rientjes@google.com, isimatu.yasuaki@jp.fujitsu.com,
yinghai@kernel.org, riel@redhat.com
Subject: Re: [PATCH 5/5] hugetlb: add support for gigantic page allocation at runtime
Date: Tue, 08 Apr 2014 16:38:03 -0400 [thread overview]
Message-ID: <53445e4b.02d50e0a.2fea.fffff738SMTPIN_ADDED_BROKEN@mx.google.com> (raw)
In-Reply-To: <1396983740-26047-6-git-send-email-lcapitulino@redhat.com>
On Tue, Apr 08, 2014 at 03:02:20PM -0400, Luiz Capitulino wrote:
> HugeTLB is limited to allocating hugepages whose size are less than
> MAX_ORDER order. This is so because HugeTLB allocates hugepages via
> the buddy allocator. Gigantic pages (that is, pages whose size is
> greater than MAX_ORDER order) have to be allocated at boottime.
>
> However, boottime allocation has at least two serious problems. First,
> it doesn't support NUMA and second, gigantic pages allocated at
> boottime can't be freed.
>
> This commit solves both issues by adding support for allocating gigantic
> pages during runtime. It works just like regular sized hugepages,
> meaning that the interface in sysfs is the same, it supports NUMA,
> and gigantic pages can be freed.
>
> For example, on x86_64 gigantic pages are 1GB big. To allocate two 1G
> gigantic pages on node 1, one can do:
>
> # echo 2 > \
> /sys/devices/system/node/node1/hugepages/hugepages-1048576kB/nr_hugepages
>
> And to free them all:
>
> # echo 0 > \
> /sys/devices/system/node/node1/hugepages/hugepages-1048576kB/nr_hugepages
>
> The one problem with gigantic page allocation at runtime is that it
> can't be serviced by the buddy allocator. To overcome that problem, this
> commit scans all zones from a node looking for a large enough contiguous
> region. When one is found, it's allocated by using CMA, that is, we call
> alloc_contig_range() to do the actual allocation. For example, on x86_64
> we scan all zones looking for a 1GB contiguous region. When one is found,
> it's allocated by alloc_contig_range().
>
> One expected issue with that approach is that such gigantic contiguous
> regions tend to vanish as runtime goes by. The best way to avoid this for
> now is to make gigantic page allocations very early during system boot, say
> from a init script. Other possible optimization include using compaction,
> which is supported by CMA but is not explicitly used by this commit.
>
> It's also important to note the following:
>
> 1. Gigantic pages allocated at boottime by the hugepages= command-line
> option can be freed at runtime just fine
>
> 2. This commit adds support for gigantic pages only to x86_64. The
> reason is that I don't have access to nor experience with other archs.
> The code is arch indepedent though, so it should be simple to add
> support to different archs
>
> 3. I didn't add support for hugepage overcommit, that is allocating
> a gigantic page on demand when
> /proc/sys/vm/nr_overcommit_hugepages > 0. The reason is that I don't
> think it's reasonable to do the hard and long work required for
> allocating a gigantic page at fault time. But it should be simple
> to add this if wanted
>
> Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
Looks good to me, thanks.
Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2014-04-08 20:38 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-04-08 19:02 [PATCH v2 0/5] hugetlb: add support " Luiz Capitulino
2014-04-08 19:02 ` [PATCH 1/5] hugetlb: prep_compound_gigantic_page(): drop __init marker Luiz Capitulino
2014-04-08 19:51 ` Naoya Horiguchi
[not found] ` <1396986711-o0m2kq1v@n-horiguchi@ah.jp.nec.com>
2014-04-08 20:25 ` Luiz Capitulino
2014-04-08 19:02 ` [PATCH 2/5] hugetlb: add hstate_is_gigantic() Luiz Capitulino
2014-04-08 19:02 ` [PATCH 3/5] hugetlb: update_and_free_page(): don't clear PG_reserved bit Luiz Capitulino
2014-04-08 20:51 ` Kirill A. Shutemov
2014-04-08 21:11 ` Luiz Capitulino
2014-04-08 19:02 ` [PATCH 4/5] hugetlb: move helpers up in the file Luiz Capitulino
2014-04-08 19:02 ` [PATCH 5/5] hugetlb: add support for gigantic page allocation at runtime Luiz Capitulino
2014-04-08 20:38 ` Naoya Horiguchi [this message]
2014-04-09 0:42 ` Yasuaki Ishimatsu
2014-04-09 17:56 ` Luiz Capitulino
2014-04-10 4:39 ` Yasuaki Ishimatsu
2014-04-10 17:58 [PATCH v3 0/5] hugetlb: add support " Luiz Capitulino
2014-04-10 17:58 ` [PATCH 5/5] hugetlb: add support for " Luiz Capitulino
2014-04-13 23:31 ` Yasuaki Ishimatsu
2014-04-17 23:00 ` Andrew Morton
2014-04-22 21:19 ` Luiz Capitulino
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=53445e4b.02d50e0a.2fea.fffff738SMTPIN_ADDED_BROKEN@mx.google.com \
--to=n-horiguchi@ah.jp.nec.com \
--cc=aarcange@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=andi@firstfloor.org \
--cc=davidlohr@hp.com \
--cc=isimatu.yasuaki@jp.fujitsu.com \
--cc=lcapitulino@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mgorman@suse.de \
--cc=mtosatti@redhat.com \
--cc=riel@redhat.com \
--cc=rientjes@google.com \
--cc=yinghai@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox