From: David Rientjes <rientjes@google.com>
To: Mina Almasry <almasrymina@google.com>
Cc: mike.kravetz@oracle.com, shuah@kernel.org, shakeelb@google.com,
gthelen@google.com, akpm@linux-foundation.org,
linux-kernel@vger.kernel.org, linux-mm@kvack.org,
linux-kselftest@vger.kernel.org, cgroups@vger.kernel.org,
aneesh.kumar@linux.vnet.ibm.com, mkoutny@suse.com,
Hillf Danton <hdanton@sina.com>
Subject: Re: [PATCH v9 8/8] hugetlb_cgroup: Add hugetlb_cgroup reservation docs
Date: Mon, 13 Jan 2020 16:54:46 -0800 (PST) [thread overview]
Message-ID: <alpine.DEB.2.21.2001131649100.164268@chino.kir.corp.google.com> (raw)
In-Reply-To: <20191217231615.164161-8-almasrymina@google.com>
On Tue, 17 Dec 2019, Mina Almasry wrote:
> diff --git a/Documentation/admin-guide/cgroup-v1/hugetlb.rst b/Documentation/admin-guide/cgroup-v1/hugetlb.rst
> index a3902aa253a96..efb94e4db9d5a 100644
> --- a/Documentation/admin-guide/cgroup-v1/hugetlb.rst
> +++ b/Documentation/admin-guide/cgroup-v1/hugetlb.rst
> @@ -2,13 +2,6 @@
> HugeTLB Controller
> ==================
>
> -The HugeTLB controller allows to limit the HugeTLB usage per control group and
> -enforces the controller limit during page fault. Since HugeTLB doesn't
> -support page reclaim, enforcing the limit at page fault time implies that,
> -the application will get SIGBUS signal if it tries to access HugeTLB pages
> -beyond its limit. This requires the application to know beforehand how much
> -HugeTLB pages it would require for its use.
> -
> HugeTLB controller can be created by first mounting the cgroup filesystem.
>
> # mount -t cgroup -o hugetlb none /sys/fs/cgroup
> @@ -28,10 +21,14 @@ process (bash) into it.
>
> Brief summary of control files::
>
> - hugetlb.<hugepagesize>.limit_in_bytes # set/show limit of "hugepagesize" hugetlb usage
> - hugetlb.<hugepagesize>.max_usage_in_bytes # show max "hugepagesize" hugetlb usage recorded
> - hugetlb.<hugepagesize>.usage_in_bytes # show current usage for "hugepagesize" hugetlb
> - hugetlb.<hugepagesize>.failcnt # show the number of allocation failure due to HugeTLB limit
> + hugetlb.<hugepagesize>.reservation_limit_in_bytes # set/show limit of "hugepagesize" hugetlb reservations
> + hugetlb.<hugepagesize>.reservation_max_usage_in_bytes # show max "hugepagesize" hugetlb reservations and no-reserve faults.
> + hugetlb.<hugepagesize>.reservation_usage_in_bytes # show current reservations and no-reserve faults for "hugepagesize" hugetlb
> + hugetlb.<hugepagesize>.reservation_failcnt # show the number of allocation failure due to HugeTLB reservation limit
> + hugetlb.<hugepagesize>.limit_in_bytes # set/show limit of "hugepagesize" hugetlb faults
> + hugetlb.<hugepagesize>.max_usage_in_bytes # show max "hugepagesize" hugetlb usage recorded
> + hugetlb.<hugepagesize>.usage_in_bytes # show current usage for "hugepagesize" hugetlb
> + hugetlb.<hugepagesize>.failcnt # show the number of allocation failure due to HugeTLB usage limit
>
I assume these are better named hugetlb.<hugepagesize>.reservation.*
rather than reservation_*, or perhaps shortened to resv.*, so for example
hugetlb.<hugepagesize>.resv.limit_in_bytes.
> For a system supporting three hugepage sizes (64k, 32M and 1G), the control
> files include::
> @@ -40,11 +37,56 @@ files include::
> hugetlb.1GB.max_usage_in_bytes
> hugetlb.1GB.usage_in_bytes
> hugetlb.1GB.failcnt
> + hugetlb.1GB.reservation_limit_in_bytes
> + hugetlb.1GB.reservation_max_usage_in_bytes
> + hugetlb.1GB.reservation_usage_in_bytes
> + hugetlb.1GB.reservation_failcnt
> hugetlb.64KB.limit_in_bytes
> hugetlb.64KB.max_usage_in_bytes
> hugetlb.64KB.usage_in_bytes
> hugetlb.64KB.failcnt
> + hugetlb.64KB.reservation_limit_in_bytes
> + hugetlb.64KB.reservation_max_usage_in_bytes
> + hugetlb.64KB.reservation_usage_in_bytes
> + hugetlb.64KB.reservation_failcnt
> hugetlb.32MB.limit_in_bytes
> hugetlb.32MB.max_usage_in_bytes
> hugetlb.32MB.usage_in_bytes
> hugetlb.32MB.failcnt
> + hugetlb.32MB.reservation_limit_in_bytes
> + hugetlb.32MB.reservation_max_usage_in_bytes
> + hugetlb.32MB.reservation_usage_in_bytes
> + hugetlb.32MB.reservation_failcnt
> +
> +
> +1. Reservation limits
Should probably be described after the page fault limits since those are
the canonical limits that already exist and the "reservation_.*"
equivalents are supplementary.
> +
> +The HugeTLB controller allows to limit the HugeTLB reservations per control
> +group and enforces the controller limit at reservation time and at the fault of
> +hugetlb memory for which no reservation exists. Reservation limits
> +are superior to Page fault limits (see section 2), since Reservation limits are
> +enforced at reservation time (on mmap or shget), and never causes the
> +application to get SIGBUS signal if the memory was reserved before hand. For
> +MAP_NORESERVE allocations, the reservation limit behaves the same as the fault
> +limit, enforcing memory usage at fault time and causing the application to
> +receive a SIGBUS if it's crossing its limit.
> +
When saying that reservation limits are superior to page fault limits, it
might be helpful to expand on the downsides of page fault limits. The
existing documentation calls out that the application needs to know its
expected usage; it does not call attention to the fact that several
different applications may be accessing an overcommited system-wide pool
of hugetlb memory. So it might be possible that the application
understands its own usage but it may not understand how that is
orchestrated with other applications on the same system accessing a shared
pool of hugetlb pages.
But yes, I think MAP_FAILED and allow for fallback or freeing of hugetlb
memory is far superior to SIGBUS :)
> +2. Page fault limits
> +
> +The HugeTLB controller allows to limit the HugeTLB usage (page fault) per
> +control group and enforces the controller limit during page fault. Since HugeTLB
> +doesn't support page reclaim, enforcing the limit at page fault time implies
> +that, the application will get SIGBUS signal if it tries to access HugeTLB
> +pages beyond its limit. This requires the application to know beforehand how
> +much HugeTLB pages it would require for its use.
> +
> +
> +3. Caveats with shared memory
> +
> +For shared hugetlb memory, both hugetlb reservation and page faults are charged
> +to the first task that causes the memory to be reserved or faulted, and all
> +subsequent uses of this reserved or faulted memory is done without charging.
> +
> +Shared hugetlb memory is only uncharged when it is unreserved or deallocated.
> +This is usually when the hugetlbfs file is deleted, and not when the task that
> +caused the reservation or fault has exited.
Discussion of reparenting?
next prev parent reply other threads:[~2020-01-14 0:54 UTC|newest]
Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-12-17 23:16 [PATCH v9 1/8] hugetlb_cgroup: Add hugetlb_cgroup reservation counter Mina Almasry
2019-12-17 23:16 ` [PATCH v9 2/8] hugetlb_cgroup: add interface for charge/uncharge hugetlb reservations Mina Almasry
2020-01-13 22:14 ` Mike Kravetz
2020-01-14 0:45 ` David Rientjes
2020-01-14 22:55 ` Mina Almasry
2019-12-17 23:16 ` [PATCH v9 3/8] hugetlb_cgroup: add reservation accounting for private mappings Mina Almasry
2020-01-14 0:55 ` Mike Kravetz
2020-01-14 22:52 ` Mina Almasry
2020-01-17 22:09 ` Mike Kravetz
2020-01-22 21:40 ` Mina Almasry
2019-12-17 23:16 ` [PATCH v9 4/8] hugetlb: disable region_add file_region coalescing Mina Almasry
2020-01-21 17:38 ` Mike Kravetz
2020-01-21 17:40 ` Mike Kravetz
2019-12-17 23:16 ` [PATCH v9 5/8] hugetlb_cgroup: add accounting for shared mappings Mina Almasry
2019-12-17 23:16 ` [PATCH v9 6/8] hugetlb_cgroup: support noreserve mappings Mina Almasry
2020-01-14 0:48 ` David Rientjes
2019-12-17 23:16 ` [PATCH v9 7/8] hugetlb_cgroup: Add hugetlb_cgroup reservation tests Mina Almasry
2019-12-17 23:16 ` [PATCH v9 8/8] hugetlb_cgroup: Add hugetlb_cgroup reservation docs Mina Almasry
2020-01-14 0:54 ` David Rientjes [this message]
2019-12-19 1:12 ` [PATCH v9 1/8] hugetlb_cgroup: Add hugetlb_cgroup reservation counter Andrew Morton
2019-12-19 1:37 ` Mike Kravetz
2019-12-19 1:59 ` Mina Almasry
2020-01-13 18:43 ` Mike Kravetz
2020-01-13 21:03 ` Mina Almasry
2020-01-13 22:05 ` Mike Kravetz
2020-01-13 22:21 ` Mina Almasry
2020-01-14 0:45 ` David Rientjes
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=alpine.DEB.2.21.2001131649100.164268@chino.kir.corp.google.com \
--to=rientjes@google.com \
--cc=akpm@linux-foundation.org \
--cc=almasrymina@google.com \
--cc=aneesh.kumar@linux.vnet.ibm.com \
--cc=cgroups@vger.kernel.org \
--cc=gthelen@google.com \
--cc=hdanton@sina.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-kselftest@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mike.kravetz@oracle.com \
--cc=mkoutny@suse.com \
--cc=shakeelb@google.com \
--cc=shuah@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox