linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Nhat Pham <nphamcs@gmail.com>
To: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.com>,
	Frank van der Linden <fvdl@google.com>,
	akpm@linux-foundation.org,  riel@surriel.com,
	roman.gushchin@linux.dev, shakeelb@google.com,
	 muchun.song@linux.dev, tj@kernel.org, lizefan.x@bytedance.com,
	 shuah@kernel.org, mike.kravetz@oracle.com,
	yosryahmed@google.com,  linux-mm@kvack.org, kernel-team@meta.com,
	linux-kernel@vger.kernel.org,  cgroups@vger.kernel.org
Subject: Re: [PATCH 0/2] hugetlb memcg accounting
Date: Wed, 27 Sep 2023 10:22:54 -0700	[thread overview]
Message-ID: <CAKEwX=Ocm_Zn=3P0gBdJKSwyqWq3fX37OEGAjCA5vKgJb+QGvw@mail.gmail.com> (raw)
In-Reply-To: <20230927164430.GB365513@cmpxchg.org>

On Wed, Sep 27, 2023 at 9:44 AM Johannes Weiner <hannes@cmpxchg.org> wrote:
>
> On Wed, Sep 27, 2023 at 02:50:10PM +0200, Michal Hocko wrote:
> > On Tue 26-09-23 18:14:14, Johannes Weiner wrote:
> > [...]
> > > The fact that memory consumed by hugetlb is currently not considered
> > > inside memcg (host memory accounting and control) is inconsistent. It
> > > has been quite confusing to our service owners and complicating things
> > > for our containers team.
> >
> > I do understand how that is confusing and inconsistent as well. Hugetlb
> > is bringing throughout its existence I am afraid.
> >
> > As noted in other reply though I am not sure hugeltb pool can be
> > reasonably incorporated with a sane semantic. Neither of the regular
> > allocation nor the hugetlb reservation/actual use can fallback to the
> > pool of the other. This makes them 2 different things each hitting their
> > own failure cases that require a dedicated handling.
> >
> > Just from top of my head these are cases I do not see easy way out from:
> >       - hugetlb charge failure has two failure modes - pool empty
> >         or memcg limit reached. The former is not recoverable and
> >         should fail without any further intervention the latter might
> >         benefit from reclaiming.
> >       - !hugetlb memory charge failure cannot consider any hugetlb
> >         pages - they are implicit memory.min protection so it is
> >           impossible to manage reclaim protection without having a
> >           knowledge of the hugetlb use.
> >       - there is no way to control the hugetlb pool distribution by
> >         memcg limits. How do we distinguish reservations from actual
> >         use?
> >       - pre-allocated pool is consuming memory without any actual
> >         owner until it is actually used and even that has two stages
> >         (reserved and really used). This makes it really hard to
> >         manage memory as whole when there is a considerable amount of
> >         hugetlb memore preallocated.
>
> It's important to distinguish hugetlb access policy from memory use
> policy. This patch isn't about hugetlb access, it's about general
> memory use.
>
> Hugetlb access policy is a separate domain with separate
> answers. Preallocating is a privileged operation, for access control
> there is the hugetlb cgroup controller etc.
>
> What's missing is that once you get past the access restrictions and
> legitimately get your hands on huge pages, that memory use gets
> reflected in memory.current and exerts pressure on *other* memory
> inside the group, such as anon or optimistic cache allocations.
>
> Note that hugetlb *can* be allocated on demand. It's unexpected that
> when an application optimistically allocates a couple of 2M hugetlb
> pages those aren't reflected in its memory.current. The same is true
> for hugetlb_cma. If the gigantic pages aren't currently allocated to a
> cgroup, that CMA memory can be used for movable memory elsewhere.
>
> The points you and Frank raise are reasons and scenarios where
> additional hugetlb access control is necessary - preallocation,
> limited availability of 1G pages etc. But they're not reasons against
> charging faulted in hugetlb to the memcg *as well*.
>
> My point is we need both. One to manage competition over hugetlb,
> because it has unique limitations. The other to manage competition
> over host memory which hugetlb is a part of.
>
> Here is a usecase from our fleet.
>
> Imagine a configuration with two 32G containers. The machine is booted
> with hugetlb_cma=6G, and each container may or may not use up to 3
> gigantic page, depending on the workload within it. The rest is anon,
> cache, slab, etc. You set the hugetlb cgroup limit of each cgroup to
> 3G to enforce hugetlb fairness. But how do you configure memory.max to
> keep *overall* consumption, including anon, cache, slab etc. fair?
>
> If used hugetlb is charged, you can just set memory.max=32G regardless
> of the workload inside.
>
> Without it, you'd have to constantly poll hugetlb usage and readjust
> memory.max!

Yep, and I'd like to add that this could and have caused issues in
our production system, when there is a delay in memory limits
(low or max) correction. The userspace agent in charge of correcting
these only runs periodically, and within consecutive runs the system
could be in an over/underprotected state. An instantaneous charge
towards the memory controller would close this gap.

I think we need both a HugeTLB controller and memory controller
accounting.


  reply	other threads:[~2023-09-27 17:23 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-09-26 19:49 Nhat Pham
2023-09-26 19:49 ` [PATCH 1/2] hugetlb: memcg: account hugetlb-backed memory in memory controller Nhat Pham
2023-09-26 20:50 ` [PATCH 0/2] hugetlb memcg accounting Frank van der Linden
2023-09-26 22:14   ` Johannes Weiner
2023-09-27 12:50     ` Michal Hocko
2023-09-27 16:44       ` Johannes Weiner
2023-09-27 17:22         ` Nhat Pham [this message]
2023-09-26 23:31   ` Nhat Pham
2023-09-27 11:21 ` Michal Hocko
2023-09-27 18:47   ` Johannes Weiner
2023-09-27 21:37     ` Roman Gushchin
2023-09-28 12:52       ` Johannes Weiner
2023-10-01 23:27     ` Mike Kravetz
2023-10-02 14:42       ` Johannes Weiner
2023-10-02 14:58         ` Michal Hocko
2023-10-02 15:36           ` Johannes Weiner
2023-09-27 23:33   ` Nhat Pham
2023-09-28  1:00 ` Nhat Pham

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAKEwX=Ocm_Zn=3P0gBdJKSwyqWq3fX37OEGAjCA5vKgJb+QGvw@mail.gmail.com' \
    --to=nphamcs@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=cgroups@vger.kernel.org \
    --cc=fvdl@google.com \
    --cc=hannes@cmpxchg.org \
    --cc=kernel-team@meta.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lizefan.x@bytedance.com \
    --cc=mhocko@suse.com \
    --cc=mike.kravetz@oracle.com \
    --cc=muchun.song@linux.dev \
    --cc=riel@surriel.com \
    --cc=roman.gushchin@linux.dev \
    --cc=shakeelb@google.com \
    --cc=shuah@kernel.org \
    --cc=tj@kernel.org \
    --cc=yosryahmed@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox