linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Johannes Weiner <hannes@cmpxchg.org>
To: Michal Hocko <mhocko@suse.com>
Cc: Frank van der Linden <fvdl@google.com>,
	Nhat Pham <nphamcs@gmail.com>,
	akpm@linux-foundation.org, riel@surriel.com,
	roman.gushchin@linux.dev, shakeelb@google.com,
	muchun.song@linux.dev, tj@kernel.org, lizefan.x@bytedance.com,
	shuah@kernel.org, mike.kravetz@oracle.com, yosryahmed@google.com,
	linux-mm@kvack.org, kernel-team@meta.com,
	linux-kernel@vger.kernel.org, cgroups@vger.kernel.org
Subject: Re: [PATCH 0/2] hugetlb memcg accounting
Date: Wed, 27 Sep 2023 12:44:30 -0400	[thread overview]
Message-ID: <20230927164430.GB365513@cmpxchg.org> (raw)
In-Reply-To: <ZRQlAgDs5W/Lct4k@dhcp22.suse.cz>

On Wed, Sep 27, 2023 at 02:50:10PM +0200, Michal Hocko wrote:
> On Tue 26-09-23 18:14:14, Johannes Weiner wrote:
> [...]
> > The fact that memory consumed by hugetlb is currently not considered
> > inside memcg (host memory accounting and control) is inconsistent. It
> > has been quite confusing to our service owners and complicating things
> > for our containers team.
> 
> I do understand how that is confusing and inconsistent as well. Hugetlb
> is bringing throughout its existence I am afraid.
> 
> As noted in other reply though I am not sure hugeltb pool can be
> reasonably incorporated with a sane semantic. Neither of the regular
> allocation nor the hugetlb reservation/actual use can fallback to the
> pool of the other. This makes them 2 different things each hitting their
> own failure cases that require a dedicated handling.
> 
> Just from top of my head these are cases I do not see easy way out from:
> 	- hugetlb charge failure has two failure modes - pool empty
> 	  or memcg limit reached. The former is not recoverable and
> 	  should fail without any further intervention the latter might
> 	  benefit from reclaiming.
> 	- !hugetlb memory charge failure cannot consider any hugetlb
> 	  pages - they are implicit memory.min protection so it is
>           impossible to manage reclaim protection without having a
>           knowledge of the hugetlb use.
> 	- there is no way to control the hugetlb pool distribution by
> 	  memcg limits. How do we distinguish reservations from actual
> 	  use?
> 	- pre-allocated pool is consuming memory without any actual
> 	  owner until it is actually used and even that has two stages
> 	  (reserved and really used). This makes it really hard to
> 	  manage memory as whole when there is a considerable amount of
> 	  hugetlb memore preallocated.

It's important to distinguish hugetlb access policy from memory use
policy. This patch isn't about hugetlb access, it's about general
memory use.

Hugetlb access policy is a separate domain with separate
answers. Preallocating is a privileged operation, for access control
there is the hugetlb cgroup controller etc.

What's missing is that once you get past the access restrictions and
legitimately get your hands on huge pages, that memory use gets
reflected in memory.current and exerts pressure on *other* memory
inside the group, such as anon or optimistic cache allocations.

Note that hugetlb *can* be allocated on demand. It's unexpected that
when an application optimistically allocates a couple of 2M hugetlb
pages those aren't reflected in its memory.current. The same is true
for hugetlb_cma. If the gigantic pages aren't currently allocated to a
cgroup, that CMA memory can be used for movable memory elsewhere.

The points you and Frank raise are reasons and scenarios where
additional hugetlb access control is necessary - preallocation,
limited availability of 1G pages etc. But they're not reasons against
charging faulted in hugetlb to the memcg *as well*.

My point is we need both. One to manage competition over hugetlb,
because it has unique limitations. The other to manage competition
over host memory which hugetlb is a part of.

Here is a usecase from our fleet.

Imagine a configuration with two 32G containers. The machine is booted
with hugetlb_cma=6G, and each container may or may not use up to 3
gigantic page, depending on the workload within it. The rest is anon,
cache, slab, etc. You set the hugetlb cgroup limit of each cgroup to
3G to enforce hugetlb fairness. But how do you configure memory.max to
keep *overall* consumption, including anon, cache, slab etc. fair?

If used hugetlb is charged, you can just set memory.max=32G regardless
of the workload inside.

Without it, you'd have to constantly poll hugetlb usage and readjust
memory.max!


  reply	other threads:[~2023-09-27 16:44 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-09-26 19:49 Nhat Pham
2023-09-26 19:49 ` [PATCH 1/2] hugetlb: memcg: account hugetlb-backed memory in memory controller Nhat Pham
2023-09-26 20:50 ` [PATCH 0/2] hugetlb memcg accounting Frank van der Linden
2023-09-26 22:14   ` Johannes Weiner
2023-09-27 12:50     ` Michal Hocko
2023-09-27 16:44       ` Johannes Weiner [this message]
2023-09-27 17:22         ` Nhat Pham
2023-09-26 23:31   ` Nhat Pham
2023-09-27 11:21 ` Michal Hocko
2023-09-27 18:47   ` Johannes Weiner
2023-09-27 21:37     ` Roman Gushchin
2023-09-28 12:52       ` Johannes Weiner
2023-10-01 23:27     ` Mike Kravetz
2023-10-02 14:42       ` Johannes Weiner
2023-10-02 14:58         ` Michal Hocko
2023-10-02 15:36           ` Johannes Weiner
2023-09-27 23:33   ` Nhat Pham
2023-09-28  1:00 ` Nhat Pham

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230927164430.GB365513@cmpxchg.org \
    --to=hannes@cmpxchg.org \
    --cc=akpm@linux-foundation.org \
    --cc=cgroups@vger.kernel.org \
    --cc=fvdl@google.com \
    --cc=kernel-team@meta.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lizefan.x@bytedance.com \
    --cc=mhocko@suse.com \
    --cc=mike.kravetz@oracle.com \
    --cc=muchun.song@linux.dev \
    --cc=nphamcs@gmail.com \
    --cc=riel@surriel.com \
    --cc=roman.gushchin@linux.dev \
    --cc=shakeelb@google.com \
    --cc=shuah@kernel.org \
    --cc=tj@kernel.org \
    --cc=yosryahmed@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox