From: Roman Gushchin <roman.gushchin@linux.dev>
To: Kairui Song <ryncsn@gmail.com>
Cc: Shakeel Butt <shakeel.butt@linux.dev>,
Andrew Morton <akpm@linux-foundation.org>,
Muchun Song <muchun.song@linux.dev>,
Johannes Weiner <hannes@cmpxchg.org>,
Michal Hocko <mhocko@kernel.org>,
Matthew Wilcox <willy@infradead.org>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org,
gthelen@google.coma, rientjes@google.com,
Chris Li <chrisl@kernel.org>
Subject: Re: [PATCH rfc 0/9] mm: memcg: separate legacy cgroup v1 code and put under config option
Date: Thu, 23 May 2024 12:55:46 -0700 [thread overview]
Message-ID: <Zk-fQtFrj-2YDJOo@P9FQF9L96D.corp.robot.car> (raw)
In-Reply-To: <CAMgjq7ARr0=OG8GQOJvzLtfdrtiwvJ19-mx1snxqmLE0Za+QCw@mail.gmail.com>
On Thu, May 23, 2024 at 01:58:49AM +0800, Kairui Song wrote:
> On Thu, May 9, 2024 at 2:33 PM Shakeel Butt <shakeel.butt@linux.dev> wrote:
> >
> > On Wed, May 08, 2024 at 08:41:29PM -0700, Roman Gushchin wrote:
> > > Cgroups v2 have been around for a while and many users have fully adopted them,
> > > so they never use cgroups v1 features and functionality. Yet they have to "pay"
> > > for the cgroup v1 support anyway:
> > > 1) the kernel binary contains useless cgroup v1 code,
> > > 2) some common structures like task_struct and mem_cgroup have never used
> > > cgroup v1-specific members,
> > > 3) some code paths have additional checks which are not needed.
> > >
> > > Cgroup v1's memory controller has a number of features that are not supported
> > > by cgroup v2 and their implementation is pretty much self contained.
> > > Most notably, these features are: soft limit reclaim, oom handling in userspace,
> > > complicated event notification system, charge migration.
> > >
> > > Cgroup v1-specific code in memcontrol.c is close to 4k lines in size and it's
> > > intervened with generic and cgroup v2-specific code. It's a burden on
> > > developers and maintainers.
> > >
> > > This patchset aims to solve these problems by:
> > > 1) moving cgroup v1-specific memcg code to the new mm/memcontrol-v1.c file,
> > > 2) putting definitions shared by memcontrol.c and memcontrol-v1.c into the
> > > mm/internal.h header
> > > 3) introducing the CONFIG_MEMCG_V1 config option, turned on by default
> > > 4) making memcontrol-v1.c to compile only if CONFIG_MEMCG_V1 is set
> > > 5) putting unused struct memory_cgroup and task_struct members under
> > > CONFIG_MEMCG_V1 as well.
> > >
> > > This is an RFC version, which is not 100% polished yet, so but it would be great
> > > to discuss and agree on the overall approach.
> > >
> > > Some open questions, opinions are appreciated:
> > > 1) I consider renaming non-static functions in memcontrol-v1.c to have
> > > mem_cgroup_v1_ prefix. Is this a good idea?
> > > 2) Do we want to extend it beyond the memory controller? Should
> > > 3) Is it better to use a new include/linux/memcontrol-v1.h instead of
> > > mm/internal.h? Or mm/memcontrol-v1.h.
> > >
> >
> > Hi Roman,
> >
> > A very timely and important topic and we should definitely talk about it
> > during LSFMM as well. I have been thinking about this problem for quite
> > sometime and I am getting more and more convinced that we should aim to
> > completely deprecate memcg-v1.
> >
> > More specifically:
> >
> > 1. What are the memcg-v1 features which have no alternative in memcg-v2
> > and are blocker for memcg-v1 users? (setting aside the cgroup v2
> > structual restrictions)
> >
> > 2. What are unused memcg-v1 features which we should start deprecating?
> >
> > IMO we should systematically start deprecating memcg-v1 features and
> > start unblocking the users stuck on memcg-v1.
> >
> > Now regarding the proposal in this series, I think it can be a first
> > step but should not give an impression that we are done. The only
> > concern I have is the potential of "out of sight, out of mind" situation
> > with this change but if we keep the momentum of deprecation of memcg-v1
> > it should be fine.
> >
> > I have CCed Greg and David from Google to get their opinion on what
> > memcg-v1 features are blocker for their memcg-v2 migration and if they
> > have concern in deprecation of memcg-v1 features.
> >
> > Anyone else still on memcg-v1, please do provide your input.
> >
>
> Hi,
>
> Sorry for joining the discussion late, but I'd like to add some info
> here: We are using the "memsw" feature a lot. It's a very useful knob
> for container memory overcommitting: It's a great abstraction of the
> "expected total memory usage" of a container, so containers can't
> allocate too much memory using SWAP, but still be able to SWAP out.
>
> For a simple example, with memsw.limit == memory.limit, containers
> can't exceed their original memory limit, even with SWAP enabled, they
> get OOM killed as how they used to, but the host is now able to
> offload cold pages.
>
> Similar ability seems absent with V2: With memory.swap.max == 0, the
> host can't use SWAP to reclaim container memory at all. But with a
> value larger than that, containers are able to overuse memory, causing
> delayed OOM kill, thrashing, CPU/Memory usage ratio could be heavily
> out of balance, especially with compress SWAP backends.
>
> Cgroup accounting of ZSWAP/ZRAM doesn't really help, we want to
> account for the total raw usage, not the compressed usage. One example
> is that if a container uses tons of duplicated pages, then it can
> allocate much more memory than it is limited, that could cause
> trouble.
So you don't need separate swap knobs, only combined, right?
> I saw Chris also mentioned Google has a workaround internally for it
> for Cgroup V2. This will be a blocker for us and a similar workaround
> might be needed. It will be great so see an upstream support for this.
I think that _at least_ we should refactor the code so that it would
be a minimal patch (e.g. one #define) to switch to the old mode.
I don't think it's reasonable to add a new interface, but having a
patch/config option or even a mount option which changes the semantics
of memory.swap.max to the v1-like behavior should be ok.
I'll try to do the first part (refactoring this code), and we can have
a discussion from there.
Thanks!
next prev parent reply other threads:[~2024-05-23 19:55 UTC|newest]
Thread overview: 32+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-05-09 3:41 Roman Gushchin
2024-05-09 3:41 ` [PATCH rfc 1/9] mm: memcg: introduce memcontrol-v1.c Roman Gushchin
2024-05-09 3:41 ` [PATCH rfc 2/9] mm: memcg: move soft limit reclaim code to memcontrol-v1.c Roman Gushchin
2024-05-09 3:41 ` [PATCH rfc 3/9] mm: memcg: move charge migration " Roman Gushchin
2024-05-09 3:41 ` [PATCH rfc 4/9] mm: memcg: move legacy memcg event code into memcontrol-v1.c Roman Gushchin
2024-05-09 3:41 ` [PATCH rfc 5/9] mm: memcg: move cgroup v1 interface files to memcontrol-v1.c Roman Gushchin
2024-05-09 3:41 ` [PATCH rfc 6/9] mm: memcg: move cgroup v1 oom handling code into memcontrol-v1.c Roman Gushchin
2024-05-10 13:26 ` Michal Hocko
2024-05-25 1:03 ` Roman Gushchin
2024-05-09 3:41 ` [PATCH rfc 7/9] mm: memcg: put cgroup v1-specific code under a config option Roman Gushchin
2024-05-09 3:41 ` [PATCH rfc 8/9] mm: memcg: put corresponding struct mem_cgroup members under CONFIG_MEMCG_V1 Roman Gushchin
2024-05-09 3:41 ` [PATCH rfc 9/9] mm: memcg: put cgroup v1-related members of task_struct under config option Roman Gushchin
2024-05-09 6:33 ` [PATCH rfc 0/9] mm: memcg: separate legacy cgroup v1 code and put " Shakeel Butt
2024-05-09 17:30 ` Roman Gushchin
2024-05-10 2:59 ` David Rientjes
2024-05-10 7:10 ` Chris Li
2024-05-10 8:10 ` Michal Hocko
2024-05-16 3:35 ` Yafang Shao
2024-05-16 17:29 ` Roman Gushchin
2024-05-17 2:21 ` Yafang Shao
2024-05-18 2:13 ` Roman Gushchin
2024-05-18 7:32 ` Shakeel Butt
2024-05-20 2:14 ` Yafang Shao
2024-05-22 17:58 ` Kairui Song
2024-05-23 19:55 ` Roman Gushchin [this message]
2024-05-23 20:26 ` Chris Li
2024-05-28 17:20 ` Kairui Song
2024-05-09 14:22 ` Johannes Weiner
2024-05-09 14:36 ` Johannes Weiner
2024-05-09 14:57 ` Roman Gushchin
2024-05-10 14:18 ` Johannes Weiner
2024-05-10 13:33 ` Michal Hocko
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Zk-fQtFrj-2YDJOo@P9FQF9L96D.corp.robot.car \
--to=roman.gushchin@linux.dev \
--cc=akpm@linux-foundation.org \
--cc=chrisl@kernel.org \
--cc=gthelen@google.coma \
--cc=hannes@cmpxchg.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@kernel.org \
--cc=muchun.song@linux.dev \
--cc=rientjes@google.com \
--cc=ryncsn@gmail.com \
--cc=shakeel.butt@linux.dev \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox