From: Waiman Long <longman@redhat.com>
To: Yosry Ahmed <yosryahmed@google.com>,
"T.J. Mercier" <tjmercier@google.com>
Cc: lsf-pc@lists.linux-foundation.org, linux-mm@kvack.org,
cgroups@vger.kernel.org, Tejun Heo <tj@kernel.org>,
Shakeel Butt <shakeelb@google.com>,
Muchun Song <muchun.song@linux.dev>,
Johannes Weiner <hannes@cmpxchg.org>,
Roman Gushchin <roman.gushchin@linux.dev>,
Alistair Popple <apopple@nvidia.com>,
Jason Gunthorpe <jgg@nvidia.com>,
Kalesh Singh <kaleshsingh@google.com>,
Yu Zhao <yuzhao@google.com>, Matthew Wilcox <willy@infradead.org>,
David Rientjes <rientjes@google.com>,
Greg Thelen <gthelen@google.com>
Subject: Re: [LSF/MM/BPF TOPIC] Reducing zombie memcgs
Date: Tue, 25 Apr 2023 14:42:41 -0400 [thread overview]
Message-ID: <27e15be8-d0eb-ed32-a0ec-5ec9b59f1f27@redhat.com> (raw)
In-Reply-To: <CAJD7tkb56gR0X5v3VHfmk3az3bOz=wF2jhEi+7Eek0J8XXBeWQ@mail.gmail.com>
On 4/25/23 07:36, Yosry Ahmed wrote:
> +David Rientjes +Greg Thelen +Matthew Wilcox
>
> On Tue, Apr 11, 2023 at 4:48 PM Yosry Ahmed <yosryahmed@google.com> wrote:
>> On Tue, Apr 11, 2023 at 4:36 PM T.J. Mercier <tjmercier@google.com> wrote:
>>> When a memcg is removed by userspace it gets offlined by the kernel.
>>> Offline memcgs are hidden from user space, but they still live in the
>>> kernel until their reference count drops to 0. New allocations cannot
>>> be charged to offline memcgs, but existing allocations charged to
>>> offline memcgs remain charged, and hold a reference to the memcg.
>>>
>>> As such, an offline memcg can remain in the kernel indefinitely,
>>> becoming a zombie memcg. The accumulation of a large number of zombie
>>> memcgs lead to increased system overhead (mainly percpu data in struct
>>> mem_cgroup). It also causes some kernel operations that scale with the
>>> number of memcgs to become less efficient (e.g. reclaim).
>>>
>>> There are currently out-of-tree solutions which attempt to
>>> periodically clean up zombie memcgs by reclaiming from them. However
>>> that is not effective for non-reclaimable memory, which it would be
>>> better to reparent or recharge to an online cgroup. There are also
>>> proposed changes that would benefit from recharging for shared
>>> resources like pinned pages, or DMA buffer pages.
>> I am very interested in attending this discussion, it's something that
>> I have been actively looking into -- specifically recharging pages of
>> offlined memcgs.
>>
>>> Suggested attendees:
>>> Yosry Ahmed <yosryahmed@google.com>
>>> Yu Zhao <yuzhao@google.com>
>>> T.J. Mercier <tjmercier@google.com>
>>> Tejun Heo <tj@kernel.org>
>>> Shakeel Butt <shakeelb@google.com>
>>> Muchun Song <muchun.song@linux.dev>
>>> Johannes Weiner <hannes@cmpxchg.org>
>>> Roman Gushchin <roman.gushchin@linux.dev>
>>> Alistair Popple <apopple@nvidia.com>
>>> Jason Gunthorpe <jgg@nvidia.com>
>>> Kalesh Singh <kaleshsingh@google.com>
> I was hoping I would bring a more complete idea to this thread, but
> here is what I have so far.
>
> The idea is to recharge the memory charged to memcgs when they are
> offlined. I like to think of the options we have to deal with memory
> charged to offline memcgs as a toolkit. This toolkit includes:
>
> (a) Evict memory.
>
> This is the simplest option, just evict the memory.
>
> For file-backed pages, this writes them back to their backing files,
> uncharging and freeing the page. The next access will read the page
> again and the faulting process’s memcg will be charged.
>
> For swap-backed pages (anon/shmem), this swaps them out. Swapping out
> a page charged to an offline memcg uncharges the page and charges the
> swap to its parent. The next access will swap in the page and the
> parent will be charged. This is effectively deferred recharging to the
> parent.
>
> Pros:
> - Simple.
>
> Cons:
> - Behavior is different for file-backed vs. swap-backed pages, for
> swap-backed pages, the memory is recharged to the parent (aka
> reparented), not charged to the "rightful" user.
> - Next access will incur higher latency, especially if the pages are active.
>
> (b) Direct recharge to the parent
>
> This can be done for any page and should be simple as the pages are
> already hierarchically charged to the parent.
>
> Pros:
> - Simple.
>
> Cons:
> - If a different memcg is using the memory, it will keep taxing the
> parent indefinitely. Same not the "rightful" user argument.
Muchun had actually posted patch to do this last year. See
https://lore.kernel.org/all/20220621125658.64935-10-songmuchun@bytedance.com/T/#me9dbbce85e2f3c4e5f34b97dbbdb5f79d77ce147
I am wondering if he is going to post an updated version of that or not.
Anyway, I am looking forward to learn about the result of this
discussion even thought I am not a conference invitee.
Thanks,
Longman
next prev parent reply other threads:[~2023-04-25 18:42 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-04-11 23:36 T.J. Mercier
2023-04-11 23:48 ` Yosry Ahmed
2023-04-25 11:36 ` Yosry Ahmed
2023-04-25 18:42 ` Waiman Long [this message]
2023-04-25 18:53 ` Yosry Ahmed
2023-04-26 20:15 ` Waiman Long
2023-05-01 16:38 ` Roman Gushchin
2023-05-02 7:18 ` Yosry Ahmed
2023-05-02 20:02 ` Yosry Ahmed
2023-05-03 22:15 ` Chris Li
2023-05-04 11:58 ` Alistair Popple
2023-05-04 15:31 ` Chris Li
2023-05-05 13:53 ` Alistair Popple
2023-05-06 22:49 ` Chris Li
2023-05-08 8:17 ` Alistair Popple
2023-05-10 14:51 ` Chris Li
2023-05-12 8:45 ` Alistair Popple
2023-05-12 21:09 ` Jason Gunthorpe
2023-05-16 12:21 ` Alistair Popple
2023-05-19 15:47 ` Jason Gunthorpe
2023-05-20 15:09 ` Chris Li
2023-05-20 15:31 ` Chris Li
2023-05-29 19:31 ` Jason Gunthorpe
2023-05-04 17:02 ` Shakeel Butt
2023-05-04 17:36 ` Chris Li
2023-05-12 3:08 ` Yosry Ahmed
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=27e15be8-d0eb-ed32-a0ec-5ec9b59f1f27@redhat.com \
--to=longman@redhat.com \
--cc=apopple@nvidia.com \
--cc=cgroups@vger.kernel.org \
--cc=gthelen@google.com \
--cc=hannes@cmpxchg.org \
--cc=jgg@nvidia.com \
--cc=kaleshsingh@google.com \
--cc=linux-mm@kvack.org \
--cc=lsf-pc@lists.linux-foundation.org \
--cc=muchun.song@linux.dev \
--cc=rientjes@google.com \
--cc=roman.gushchin@linux.dev \
--cc=shakeelb@google.com \
--cc=tj@kernel.org \
--cc=tjmercier@google.com \
--cc=willy@infradead.org \
--cc=yosryahmed@google.com \
--cc=yuzhao@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox