From: Nhat Pham <nphamcs@gmail.com>
To: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.com>,
Frank van der Linden <fvdl@google.com>,
akpm@linux-foundation.org, riel@surriel.com,
roman.gushchin@linux.dev, shakeelb@google.com,
muchun.song@linux.dev, tj@kernel.org, lizefan.x@bytedance.com,
shuah@kernel.org, mike.kravetz@oracle.com,
yosryahmed@google.com, linux-mm@kvack.org, kernel-team@meta.com,
linux-kernel@vger.kernel.org, cgroups@vger.kernel.org
Subject: Re: [PATCH 0/2] hugetlb memcg accounting
Date: Wed, 27 Sep 2023 10:22:54 -0700 [thread overview]
Message-ID: <CAKEwX=Ocm_Zn=3P0gBdJKSwyqWq3fX37OEGAjCA5vKgJb+QGvw@mail.gmail.com> (raw)
In-Reply-To: <20230927164430.GB365513@cmpxchg.org>
On Wed, Sep 27, 2023 at 9:44 AM Johannes Weiner <hannes@cmpxchg.org> wrote:
>
> On Wed, Sep 27, 2023 at 02:50:10PM +0200, Michal Hocko wrote:
> > On Tue 26-09-23 18:14:14, Johannes Weiner wrote:
> > [...]
> > > The fact that memory consumed by hugetlb is currently not considered
> > > inside memcg (host memory accounting and control) is inconsistent. It
> > > has been quite confusing to our service owners and complicating things
> > > for our containers team.
> >
> > I do understand how that is confusing and inconsistent as well. Hugetlb
> > is bringing throughout its existence I am afraid.
> >
> > As noted in other reply though I am not sure hugeltb pool can be
> > reasonably incorporated with a sane semantic. Neither of the regular
> > allocation nor the hugetlb reservation/actual use can fallback to the
> > pool of the other. This makes them 2 different things each hitting their
> > own failure cases that require a dedicated handling.
> >
> > Just from top of my head these are cases I do not see easy way out from:
> > - hugetlb charge failure has two failure modes - pool empty
> > or memcg limit reached. The former is not recoverable and
> > should fail without any further intervention the latter might
> > benefit from reclaiming.
> > - !hugetlb memory charge failure cannot consider any hugetlb
> > pages - they are implicit memory.min protection so it is
> > impossible to manage reclaim protection without having a
> > knowledge of the hugetlb use.
> > - there is no way to control the hugetlb pool distribution by
> > memcg limits. How do we distinguish reservations from actual
> > use?
> > - pre-allocated pool is consuming memory without any actual
> > owner until it is actually used and even that has two stages
> > (reserved and really used). This makes it really hard to
> > manage memory as whole when there is a considerable amount of
> > hugetlb memore preallocated.
>
> It's important to distinguish hugetlb access policy from memory use
> policy. This patch isn't about hugetlb access, it's about general
> memory use.
>
> Hugetlb access policy is a separate domain with separate
> answers. Preallocating is a privileged operation, for access control
> there is the hugetlb cgroup controller etc.
>
> What's missing is that once you get past the access restrictions and
> legitimately get your hands on huge pages, that memory use gets
> reflected in memory.current and exerts pressure on *other* memory
> inside the group, such as anon or optimistic cache allocations.
>
> Note that hugetlb *can* be allocated on demand. It's unexpected that
> when an application optimistically allocates a couple of 2M hugetlb
> pages those aren't reflected in its memory.current. The same is true
> for hugetlb_cma. If the gigantic pages aren't currently allocated to a
> cgroup, that CMA memory can be used for movable memory elsewhere.
>
> The points you and Frank raise are reasons and scenarios where
> additional hugetlb access control is necessary - preallocation,
> limited availability of 1G pages etc. But they're not reasons against
> charging faulted in hugetlb to the memcg *as well*.
>
> My point is we need both. One to manage competition over hugetlb,
> because it has unique limitations. The other to manage competition
> over host memory which hugetlb is a part of.
>
> Here is a usecase from our fleet.
>
> Imagine a configuration with two 32G containers. The machine is booted
> with hugetlb_cma=6G, and each container may or may not use up to 3
> gigantic page, depending on the workload within it. The rest is anon,
> cache, slab, etc. You set the hugetlb cgroup limit of each cgroup to
> 3G to enforce hugetlb fairness. But how do you configure memory.max to
> keep *overall* consumption, including anon, cache, slab etc. fair?
>
> If used hugetlb is charged, you can just set memory.max=32G regardless
> of the workload inside.
>
> Without it, you'd have to constantly poll hugetlb usage and readjust
> memory.max!
Yep, and I'd like to add that this could and have caused issues in
our production system, when there is a delay in memory limits
(low or max) correction. The userspace agent in charge of correcting
these only runs periodically, and within consecutive runs the system
could be in an over/underprotected state. An instantaneous charge
towards the memory controller would close this gap.
I think we need both a HugeTLB controller and memory controller
accounting.
next prev parent reply other threads:[~2023-09-27 17:23 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-09-26 19:49 Nhat Pham
2023-09-26 19:49 ` [PATCH 1/2] hugetlb: memcg: account hugetlb-backed memory in memory controller Nhat Pham
2023-09-26 20:50 ` [PATCH 0/2] hugetlb memcg accounting Frank van der Linden
2023-09-26 22:14 ` Johannes Weiner
2023-09-27 12:50 ` Michal Hocko
2023-09-27 16:44 ` Johannes Weiner
2023-09-27 17:22 ` Nhat Pham [this message]
2023-09-26 23:31 ` Nhat Pham
2023-09-27 11:21 ` Michal Hocko
2023-09-27 18:47 ` Johannes Weiner
2023-09-27 21:37 ` Roman Gushchin
2023-09-28 12:52 ` Johannes Weiner
2023-10-01 23:27 ` Mike Kravetz
2023-10-02 14:42 ` Johannes Weiner
2023-10-02 14:58 ` Michal Hocko
2023-10-02 15:36 ` Johannes Weiner
2023-09-27 23:33 ` Nhat Pham
2023-09-28 1:00 ` Nhat Pham
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAKEwX=Ocm_Zn=3P0gBdJKSwyqWq3fX37OEGAjCA5vKgJb+QGvw@mail.gmail.com' \
--to=nphamcs@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=cgroups@vger.kernel.org \
--cc=fvdl@google.com \
--cc=hannes@cmpxchg.org \
--cc=kernel-team@meta.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lizefan.x@bytedance.com \
--cc=mhocko@suse.com \
--cc=mike.kravetz@oracle.com \
--cc=muchun.song@linux.dev \
--cc=riel@surriel.com \
--cc=roman.gushchin@linux.dev \
--cc=shakeelb@google.com \
--cc=shuah@kernel.org \
--cc=tj@kernel.org \
--cc=yosryahmed@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox