From: Roman Gushchin <guro@fb.com>
To: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Roman Gushchin <guroan@gmail.com>,
Andrew Morton <akpm@linux-foundation.org>,
"linux-mm@kvack.org" <linux-mm@kvack.org>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
Kernel Team <Kernel-team@fb.com>,
Johannes Weiner <hannes@cmpxchg.org>,
Michal Hocko <mhocko@kernel.org>, Rik van Riel <riel@surriel.com>,
"david@fromorbit.com" <david@fromorbit.com>,
Christoph Lameter <cl@linux.com>,
Pekka Enberg <penberg@kernel.org>,
"cgroups@vger.kernel.org" <cgroups@vger.kernel.org>
Subject: Re: [PATCH 0/5] mm: reparent slab memory on cgroup removal
Date: Thu, 18 Apr 2019 18:27:17 +0000 [thread overview]
Message-ID: <20190418182714.GD11008@tower.DHCP.thefacebook.com> (raw)
In-Reply-To: <20190418081538.prspe27lqudvvu3u@esperanza>
On Thu, Apr 18, 2019 at 11:15:38AM +0300, Vladimir Davydov wrote:
> Hello Roman,
>
> On Wed, Apr 17, 2019 at 02:54:29PM -0700, Roman Gushchin wrote:
> > There is however a significant problem with reparenting of slab memory:
> > there is no list of charged pages. Some of them are in shrinker lists,
> > but not all. Introducing of a new list is really not an option.
>
> True, introducing a list of charged pages would negatively affect
> SL[AU]B performance since we would need to protect it with some kind
> of lock.
>
> >
> > But fortunately there is a way forward: every slab page has a stable pointer
> > to the corresponding kmem_cache. So the idea is to reparent kmem_caches
> > instead of slab pages.
> >
> > It's actually simpler and cheaper, but requires some underlying changes:
> > 1) Make kmem_caches to hold a single reference to the memory cgroup,
> > instead of a separate reference per every slab page.
> > 2) Stop setting page->mem_cgroup pointer for memcg slab pages and use
> > page->kmem_cache->memcg indirection instead. It's used only on
> > slab page release, so it shouldn't be a big issue.
> > 3) Introduce a refcounter for non-root slab caches. It's required to
> > be able to destroy kmem_caches when they become empty and release
> > the associated memory cgroup.
>
> Which means an unconditional atomic inc/dec on charge/uncharge paths
> AFAIU. Note, we have per cpu batching so charging a kmem page in cgroup
> v2 doesn't require an atomic variable modification. I guess you could
> use some sort of per cpu ref counting though.
Yes, looks like I have to switch to the percpu counter (see the thread
with Shakeel).
>
> Anyway, releasing mem_cgroup objects, but leaving kmem_cache objects
> dangling looks kinda awkward to me. It would be great if we could
> release both, but I assume it's hardly possible due to SL[AU]B
> complexity.
Kmem_caches are *much* smaller than memcgs. If the size of kmem_cache
is smaller than the size of objects which are pinning it, I think it's
acceptable. I hope to release all associated percpu memory early to make
it even smaller.
On the other hand memcgs are much larger than typical object which
are pinning it (dentries and inodes). And it rends to grow with new features
being added.
I agree that releasing both would be cool, but I doubt it's possible.
>
> What about reusing dead cgroups instead? Yeah, it would be kinda unfair,
> because a fresh cgroup would get a legacy of objects left from previous
> owners, but still, if we delete a cgroup, the workload must be dead and
> so apart from a few long-lived objects, there should mostly be cached
> objects charged to it, which should be easily released on memory
> pressure. Sorry if somebody's asked this question before - I must have
> missed that.
It's an interesting idea. The problem is that the dying cgroup can be
an almost fully functional cgroup for a long time: it can have associated
sockets, pagecache, kernel objects, etc. It's a part of cgroup tree,
all constraints and limits are still applied, it might have some background
activity.
Thanks!
prev parent reply other threads:[~2019-04-18 18:27 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-04-17 21:54 Roman Gushchin
2019-04-17 21:54 ` [PATCH 1/5] mm: postpone kmem_cache memcg pointer initialization to memcg_link_cache() Roman Gushchin
2019-04-17 21:54 ` [PATCH 2/5] mm: generalize postponed non-root kmem_cache deactivation Roman Gushchin
2019-04-17 21:54 ` [PATCH 3/5] mm: introduce __memcg_kmem_uncharge_memcg() Roman Gushchin
2019-04-17 21:54 ` [PATCH 4/5] mm: rework non-root kmem_cache lifecycle management Roman Gushchin
2019-04-17 23:41 ` Shakeel Butt
2019-04-18 0:38 ` Roman Gushchin
2019-04-18 1:55 ` Shakeel Butt
2019-04-18 3:07 ` Roman Gushchin
2019-04-18 14:05 ` Shakeel Butt
2019-04-18 18:14 ` Roman Gushchin
2019-04-18 13:34 ` Christopher Lameter
2019-04-18 18:04 ` Roman Gushchin
2019-04-18 13:38 ` Christopher Lameter
2019-04-18 18:05 ` Roman Gushchin
2019-04-17 21:54 ` [PATCH 5/5] mm: reparent slab memory on cgroup removal Roman Gushchin
2019-04-18 8:15 ` [PATCH 0/5] " Vladimir Davydov
2019-04-18 18:27 ` Roman Gushchin [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190418182714.GD11008@tower.DHCP.thefacebook.com \
--to=guro@fb.com \
--cc=Kernel-team@fb.com \
--cc=akpm@linux-foundation.org \
--cc=cgroups@vger.kernel.org \
--cc=cl@linux.com \
--cc=david@fromorbit.com \
--cc=guroan@gmail.com \
--cc=hannes@cmpxchg.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@kernel.org \
--cc=penberg@kernel.org \
--cc=riel@surriel.com \
--cc=vdavydov.dev@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox