linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Waiman Long <longman@redhat.com>
To: Yosry Ahmed <yosryahmed@google.com>,
	"T.J. Mercier" <tjmercier@google.com>
Cc: lsf-pc@lists.linux-foundation.org, linux-mm@kvack.org,
	cgroups@vger.kernel.org, Tejun Heo <tj@kernel.org>,
	Shakeel Butt <shakeelb@google.com>,
	Muchun Song <muchun.song@linux.dev>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Roman Gushchin <roman.gushchin@linux.dev>,
	Alistair Popple <apopple@nvidia.com>,
	Jason Gunthorpe <jgg@nvidia.com>,
	Kalesh Singh <kaleshsingh@google.com>,
	Yu Zhao <yuzhao@google.com>, Matthew Wilcox <willy@infradead.org>,
	David Rientjes <rientjes@google.com>,
	Greg Thelen <gthelen@google.com>
Subject: Re: [LSF/MM/BPF TOPIC] Reducing zombie memcgs
Date: Tue, 25 Apr 2023 14:42:41 -0400	[thread overview]
Message-ID: <27e15be8-d0eb-ed32-a0ec-5ec9b59f1f27@redhat.com> (raw)
In-Reply-To: <CAJD7tkb56gR0X5v3VHfmk3az3bOz=wF2jhEi+7Eek0J8XXBeWQ@mail.gmail.com>

On 4/25/23 07:36, Yosry Ahmed wrote:
>   +David Rientjes +Greg Thelen +Matthew Wilcox
>
> On Tue, Apr 11, 2023 at 4:48 PM Yosry Ahmed <yosryahmed@google.com> wrote:
>> On Tue, Apr 11, 2023 at 4:36 PM T.J. Mercier <tjmercier@google.com> wrote:
>>> When a memcg is removed by userspace it gets offlined by the kernel.
>>> Offline memcgs are hidden from user space, but they still live in the
>>> kernel until their reference count drops to 0. New allocations cannot
>>> be charged to offline memcgs, but existing allocations charged to
>>> offline memcgs remain charged, and hold a reference to the memcg.
>>>
>>> As such, an offline memcg can remain in the kernel indefinitely,
>>> becoming a zombie memcg. The accumulation of a large number of zombie
>>> memcgs lead to increased system overhead (mainly percpu data in struct
>>> mem_cgroup). It also causes some kernel operations that scale with the
>>> number of memcgs to become less efficient (e.g. reclaim).
>>>
>>> There are currently out-of-tree solutions which attempt to
>>> periodically clean up zombie memcgs by reclaiming from them. However
>>> that is not effective for non-reclaimable memory, which it would be
>>> better to reparent or recharge to an online cgroup. There are also
>>> proposed changes that would benefit from recharging for shared
>>> resources like pinned pages, or DMA buffer pages.
>> I am very interested in attending this discussion, it's something that
>> I have been actively looking into -- specifically recharging pages of
>> offlined memcgs.
>>
>>> Suggested attendees:
>>> Yosry Ahmed <yosryahmed@google.com>
>>> Yu Zhao <yuzhao@google.com>
>>> T.J. Mercier <tjmercier@google.com>
>>> Tejun Heo <tj@kernel.org>
>>> Shakeel Butt <shakeelb@google.com>
>>> Muchun Song <muchun.song@linux.dev>
>>> Johannes Weiner <hannes@cmpxchg.org>
>>> Roman Gushchin <roman.gushchin@linux.dev>
>>> Alistair Popple <apopple@nvidia.com>
>>> Jason Gunthorpe <jgg@nvidia.com>
>>> Kalesh Singh <kaleshsingh@google.com>
> I was hoping I would bring a more complete idea to this thread, but
> here is what I have so far.
>
> The idea is to recharge the memory charged to memcgs when they are
> offlined. I like to think of the options we have to deal with memory
> charged to offline memcgs as a toolkit. This toolkit includes:
>
> (a) Evict memory.
>
> This is the simplest option, just evict the memory.
>
> For file-backed pages, this writes them back to their backing files,
> uncharging and freeing the page. The next access will read the page
> again and the faulting process’s memcg will be charged.
>
> For swap-backed pages (anon/shmem), this swaps them out. Swapping out
> a page charged to an offline memcg uncharges the page and charges the
> swap to its parent. The next access will swap in the page and the
> parent will be charged. This is effectively deferred recharging to the
> parent.
>
> Pros:
> - Simple.
>
> Cons:
> - Behavior is different for file-backed vs. swap-backed pages, for
> swap-backed pages, the memory is recharged to the parent (aka
> reparented), not charged to the "rightful" user.
> - Next access will incur higher latency, especially if the pages are active.
>
> (b) Direct recharge to the parent
>
> This can be done for any page and should be simple as the pages are
> already hierarchically charged to the parent.
>
> Pros:
> - Simple.
>
> Cons:
> - If a different memcg is using the memory, it will keep taxing the
> parent indefinitely. Same not the "rightful" user argument.

Muchun had actually posted patch to do this last year. See

https://lore.kernel.org/all/20220621125658.64935-10-songmuchun@bytedance.com/T/#me9dbbce85e2f3c4e5f34b97dbbdb5f79d77ce147

I am wondering if he is going to post an updated version of that or not. 
Anyway, I am looking forward to learn about the result of this 
discussion even thought I am not a conference invitee.

Thanks,
Longman




  reply	other threads:[~2023-04-25 18:42 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-04-11 23:36 T.J. Mercier
2023-04-11 23:48 ` Yosry Ahmed
2023-04-25 11:36   ` Yosry Ahmed
2023-04-25 18:42     ` Waiman Long [this message]
2023-04-25 18:53       ` Yosry Ahmed
2023-04-26 20:15         ` Waiman Long
2023-05-01 16:38     ` Roman Gushchin
2023-05-02  7:18       ` Yosry Ahmed
2023-05-02 20:02       ` Yosry Ahmed
2023-05-03 22:15 ` Chris Li
2023-05-04 11:58   ` Alistair Popple
2023-05-04 15:31     ` Chris Li
2023-05-05 13:53       ` Alistair Popple
2023-05-06 22:49         ` Chris Li
2023-05-08  8:17           ` Alistair Popple
2023-05-10 14:51             ` Chris Li
2023-05-12  8:45               ` Alistair Popple
2023-05-12 21:09                 ` Jason Gunthorpe
2023-05-16 12:21                   ` Alistair Popple
2023-05-19 15:47                     ` Jason Gunthorpe
2023-05-20 15:09                   ` Chris Li
2023-05-20 15:31                 ` Chris Li
2023-05-29 19:31                   ` Jason Gunthorpe
2023-05-04 17:02   ` Shakeel Butt
2023-05-04 17:36     ` Chris Li
2023-05-12  3:08 ` Yosry Ahmed

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=27e15be8-d0eb-ed32-a0ec-5ec9b59f1f27@redhat.com \
    --to=longman@redhat.com \
    --cc=apopple@nvidia.com \
    --cc=cgroups@vger.kernel.org \
    --cc=gthelen@google.com \
    --cc=hannes@cmpxchg.org \
    --cc=jgg@nvidia.com \
    --cc=kaleshsingh@google.com \
    --cc=linux-mm@kvack.org \
    --cc=lsf-pc@lists.linux-foundation.org \
    --cc=muchun.song@linux.dev \
    --cc=rientjes@google.com \
    --cc=roman.gushchin@linux.dev \
    --cc=shakeelb@google.com \
    --cc=tj@kernel.org \
    --cc=tjmercier@google.com \
    --cc=willy@infradead.org \
    --cc=yosryahmed@google.com \
    --cc=yuzhao@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox