From: Chris Li <chrisl@kernel.org>
To: Shakeel Butt <shakeelb@google.com>
Cc: "T.J. Mercier" <tjmercier@google.com>,
lsf-pc@lists.linux-foundation.org, linux-mm@kvack.org,
cgroups@vger.kernel.org, Yosry Ahmed <yosryahmed@google.com>,
Tejun Heo <tj@kernel.org>, Muchun Song <muchun.song@linux.dev>,
Johannes Weiner <hannes@cmpxchg.org>,
Roman Gushchin <roman.gushchin@linux.dev>,
Alistair Popple <apopple@nvidia.com>,
Jason Gunthorpe <jgg@nvidia.com>,
Kalesh Singh <kaleshsingh@google.com>,
Yu Zhao <yuzhao@google.com>
Subject: Re: [LSF/MM/BPF TOPIC] Reducing zombie memcgs
Date: Thu, 4 May 2023 10:36:46 -0700 [thread overview]
Message-ID: <ZFPtLjiSQxRw+isR@google.com> (raw)
In-Reply-To: <CALvZod4=+ANT6UR5h7Cp+0hKkVx6tPAaRa5iqBF=L2VBdMKERQ@mail.gmail.com>
On Thu, May 04, 2023 at 10:02:38AM -0700, Shakeel Butt wrote:
> On Wed, May 3, 2023 at 3:15 PM Chris Li <chrisl@kernel.org> wrote:
> [...]
> > I am also interested in this topic. T.J. and I have some offline
> > discussion about this. We have some proposals to solve this
> > problem.
> >
> > I will share the write up here for the up coming LSF/MM discussion.
> >
> >
> > Shared Memory Cgroup Controllers
> >
> > = Introduction
> >
> > The current memory cgroup controller does not support shared memory objects. For the memory that is shared between different processes, it is not obvious which process should get charged. Google has some internal tmpfs “memcg=” mount option to charge tmpfs data to a specific memcg that’s often different from where charging processes run. However it faces some difficulties when the charged memcg exits and the charged memcg becomes a zombie memcg.
>
> What is the exact problem this proposal is solving? Is it the zombie
> memcgs? To me that is just a side effect of memory shared between
> different memcgs.
I am trying to get rid of zombie memcgs by using shared memory controllers.
That means also finding alternattive solution for "memcg=" usage.
>
> > Other approaches include “re-parenting” the memcg charge to the parent memcg. Which has its own problem. If the charge is huge, iteration of the reparenting can be costly.
>
> What is the iteration of the reparenting? Are you referring to
> reparenting the LRUs or something else?
Yes, reparenting the LRU. As Yu point out to me offlined. That LRU
iteration is on an offline memcg so it shouldn't block any thing
major.
Still, I think the smemcg offer more than recharging regarding
modeling the share relationship.
> >
> > = Proposed Solution
> >
> > The proposed solution is to add a new type of memory controller for shared memory usage. E.g. tmpfs, hugetlb, file system mmap and dma_buf. This shared memory cgroup controller object will have the same life cycle of the underlying shared memory.
>
> I am confused by the relationship between shared memory controller and
> the underlying shared memory. What does the same life cycle mean? Are
Same life cycle means, if the smemcg comes from a tmpfs, the smemcg have
the same life cycle of the tmpfs.
> the users expected to register the shared memory objects with the
> smemcg? What about unnamed shared memory objects like MAP_SHARED or
The user doesn't need to register shared memory objects. However the file
system code might need to. But I count that as the kernel not the user.
The cgroup admin adds the smemcg into the memcg that shares it. We can
also make an policy option for kernel auto add by smemcg to memcg that
share from it.
> memfds?
Each file system mount will have their own smemcg.
>
> How does the charging work for smemcg? Is this new controller hierarchical?
Charge only happen once on the smemcg.
Please see this thread for the distinguishing between charge counter and borrow counter.
https://lore.kernel.org/linux-mm/CALvZod4=+ANT6UR5h7Cp+0hKkVx6tPAaRa5iqBF=L2VBdMKERQ@mail.gmail.com/T/#m955cab80f70097d7c9a5be21c19c4851170fa052
I haven't give too much thought of the hierarchical issue.
My initial thought are on the smemcg side, those are not hierarchical.
On the memcg side, they are.
Do you have specific examples where we can discuss hierarchical usage?
> > Processes can not be added to the shared memory cgroup. Instead the shared memory cgroup can be added to the memcg using a “smemcg” API file, similar to adding a process into the “tasks” API file.
>
> Is the charge of the underlying shared memory live with smemcg or the
> memcg where smemcg is attached? Can a smemcg detach and reattach to a
Charge live with smemcg.
> different memcg?
Smemcg can add to more than one memcg without detached.
Please see the above email regarding borrow vs charge.
>
> > When a smemcg is added to the memcg, the amount of memory that has been shared in the memcg process will be accounted for as the part of the memcg “memory.current”.The memory.current of the memcg is make up of two parts, 1) the processes anonymous memory and 2) the memory shared from smemcg.
>
> The above is somewhat giving the impression that the charge of shared
> memory lives with smemcg. This can mess up or complicate the
> hierarchical property of the original memcg.
I haven't given a lot of thought to that. Can you share with me an example
how things can mess up? I will see if I can use the smemcg model to
address it.
> > When the memcg “memory.current” is raised to the limit. The kernel will active try to reclaim for the memcg to make “smemcg memory + process anonymous memory” within the limit. Further memory allocation within those memcg processes will fail if the limit can not be followed. If many reclaim attempts fail to bring the memcg “memory.current” within the limit, the process in this memcg will get OOM killed.
>
> The OOM killing for remote charging needs much more thought. Please
> see https://lwn.net/Articles/787626/ for previous discussion on
> related topic.
Yes, just take a look. I think the new idea is borrow vs charged.
Let's come up with some detail example try to break the smemcg borrow model.
Chris
next prev parent reply other threads:[~2023-05-04 17:36 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-04-11 23:36 T.J. Mercier
2023-04-11 23:48 ` Yosry Ahmed
2023-04-25 11:36 ` Yosry Ahmed
2023-04-25 18:42 ` Waiman Long
2023-04-25 18:53 ` Yosry Ahmed
2023-04-26 20:15 ` Waiman Long
2023-05-01 16:38 ` Roman Gushchin
2023-05-02 7:18 ` Yosry Ahmed
2023-05-02 20:02 ` Yosry Ahmed
2023-05-03 22:15 ` Chris Li
2023-05-04 11:58 ` Alistair Popple
2023-05-04 15:31 ` Chris Li
2023-05-05 13:53 ` Alistair Popple
2023-05-06 22:49 ` Chris Li
2023-05-08 8:17 ` Alistair Popple
2023-05-10 14:51 ` Chris Li
2023-05-12 8:45 ` Alistair Popple
2023-05-12 21:09 ` Jason Gunthorpe
2023-05-16 12:21 ` Alistair Popple
2023-05-19 15:47 ` Jason Gunthorpe
2023-05-20 15:09 ` Chris Li
2023-05-20 15:31 ` Chris Li
2023-05-29 19:31 ` Jason Gunthorpe
2023-05-04 17:02 ` Shakeel Butt
2023-05-04 17:36 ` Chris Li [this message]
2023-05-12 3:08 ` Yosry Ahmed
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZFPtLjiSQxRw+isR@google.com \
--to=chrisl@kernel.org \
--cc=apopple@nvidia.com \
--cc=cgroups@vger.kernel.org \
--cc=hannes@cmpxchg.org \
--cc=jgg@nvidia.com \
--cc=kaleshsingh@google.com \
--cc=linux-mm@kvack.org \
--cc=lsf-pc@lists.linux-foundation.org \
--cc=muchun.song@linux.dev \
--cc=roman.gushchin@linux.dev \
--cc=shakeelb@google.com \
--cc=tj@kernel.org \
--cc=tjmercier@google.com \
--cc=yosryahmed@google.com \
--cc=yuzhao@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox