Re: [PATCH v2] mm, memcg: Add a memcg_slabinfo debugfs file

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Waiman Long <longman@redhat.com>
To: Shakeel Butt <shakeelb@google.com>
Cc: Christoph Lameter <cl@linux.com>,
	Pekka Enberg <penberg@kernel.org>,
	David Rientjes <rientjes@google.com>,
	Joonsoo Kim <iamjoonsoo.kim@lge.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Linux MM <linux-mm@kvack.org>,
	LKML <linux-kernel@vger.kernel.org>,
	Michal Hocko <mhocko@kernel.org>, Roman Gushchin <guro@fb.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Vladimir Davydov <vdavydov.dev@gmail.com>
Subject: Re: [PATCH v2] mm, memcg: Add a memcg_slabinfo debugfs file
Date: Thu, 20 Jun 2019 10:23:41 -0400	[thread overview]
Message-ID: <cfc6c800-1cb4-e2f2-e6d9-f0571c11a47b@redhat.com> (raw)
In-Reply-To: <CALvZod7pdOx0a1v4oX5-7ZfCykM8iwRwPkW-+gbO1B4+j1SXqw@mail.gmail.com>

On 6/19/19 7:48 PM, Shakeel Butt wrote:
> Hi Waiman,
>
> On Wed, Jun 19, 2019 at 10:16 AM Waiman Long <longman@redhat.com> wrote:
>> There are concerns about memory leaks from extensive use of memory
>> cgroups as each memory cgroup creates its own set of kmem caches. There
>> is a possiblity that the memcg kmem caches may remain even after the
>> memory cgroups have been offlined. Therefore, it will be useful to show
>> the status of each of memcg kmem caches.
>>
>> This patch introduces a new <debugfs>/memcg_slabinfo file which is
>> somewhat similar to /proc/slabinfo in format, but lists only information
>> about kmem caches that have child memcg kmem caches. Information
>> available in /proc/slabinfo are not repeated in memcg_slabinfo.
>>
>> A portion of a sample output of the file was:
>>
>>   # <name> <css_id[:dead]> <active_objs> <num_objs> <active_slabs> <num_slabs>
>>   rpc_inode_cache   root          13     51      1      1
>>   rpc_inode_cache     48           0      0      0      0
>>   fat_inode_cache   root           1     45      1      1
>>   fat_inode_cache     41           2     45      1      1
>>   xfs_inode         root         770    816     24     24
>>   xfs_inode           92          22     34      1      1
>>   xfs_inode           88:dead      1     34      1      1
>>   xfs_inode           89:dead     23     34      1      1
>>   xfs_inode           85           4     34      1      1
>>   xfs_inode           84           9     34      1      1
>>
>> The css id of the memcg is also listed. If a memcg is not online,
>> the tag ":dead" will be attached as shown above.
>>
>> Suggested-by: Shakeel Butt <shakeelb@google.com>
>> Signed-off-by: Waiman Long <longman@redhat.com>
>> ---
>>  mm/slab_common.c | 57 ++++++++++++++++++++++++++++++++++++++++++++++++
>>  1 file changed, 57 insertions(+)
>>
>> diff --git a/mm/slab_common.c b/mm/slab_common.c
>> index 58251ba63e4a..2bca1558a722 100644
>> --- a/mm/slab_common.c
>> +++ b/mm/slab_common.c
>> @@ -17,6 +17,7 @@
>>  #include <linux/uaccess.h>
>>  #include <linux/seq_file.h>
>>  #include <linux/proc_fs.h>
>> +#include <linux/debugfs.h>
>>  #include <asm/cacheflush.h>
>>  #include <asm/tlbflush.h>
>>  #include <asm/page.h>
>> @@ -1498,6 +1499,62 @@ static int __init slab_proc_init(void)
>>         return 0;
>>  }
>>  module_init(slab_proc_init);
>> +
>> +#if defined(CONFIG_DEBUG_FS) && defined(CONFIG_MEMCG_KMEM)
>> +/*
>> + * Display information about kmem caches that have child memcg caches.
>> + */
>> +static int memcg_slabinfo_show(struct seq_file *m, void *unused)
>> +{
>> +       struct kmem_cache *s, *c;
>> +       struct slabinfo sinfo;
>> +
>> +       mutex_lock(&slab_mutex);
> On large machines there can be thousands of memcgs and potentially
> each memcg can have hundreds of kmem caches. So, the slab_mutex can be
> held for a very long time.

But that is also what /proc/slabinfo does by doing mutex_lock() at
slab_start() and mutex_unlock() at slab_stop(). So the same problem will
happen when /proc/slabinfo is being read.

When you are in a situation that reading /proc/slabinfo take a long time
because of the large number of memcg's, the system is in some kind of
trouble anyway. I am saying that we should not improve the scalability
of this patch. It is just that some nasty race conditions may pop up if
we release the lock and re-acquire it latter. That will greatly
complicate the code to handle all those edge cases.

> Our internal implementation traverses the memcg tree and then
> traverses 'memcg->kmem_caches' within the slab_mutex (and
> cond_resched() after unlock).
For cgroup v1, the setting of the CONFIG_SLUB_DEBUG option will allow
you to iterate and display slabinfo just for that particular memcg. I am
thinking of extending the debug controller to do similar thing for
cgroup v2.
>> +       seq_puts(m, "# <name> <css_id[:dead]> <active_objs> <num_objs>");
>> +       seq_puts(m, " <active_slabs> <num_slabs>\n");
>> +       list_for_each_entry(s, &slab_root_caches, root_caches_node) {
>> +               /*
>> +                * Skip kmem caches that don't have any memcg children.
>> +                */
>> +               if (list_empty(&s->memcg_params.children))
>> +                       continue;
>> +
>> +               memset(&sinfo, 0, sizeof(sinfo));
>> +               get_slabinfo(s, &sinfo);
>> +               seq_printf(m, "%-17s root      %6lu %6lu %6lu %6lu\n",
>> +                          cache_name(s), sinfo.active_objs, sinfo.num_objs,
>> +                          sinfo.active_slabs, sinfo.num_slabs);
>> +
>> +               for_each_memcg_cache(c, s) {
>> +                       struct cgroup_subsys_state *css;
>> +                       char *dead = "";
>> +
>> +                       css = &c->memcg_params.memcg->css;
>> +                       if (!(css->flags & CSS_ONLINE))
>> +                               dead = ":dead";
> Please note that Roman's kmem cache reparenting patch series have made
> kmem caches of zombie memcgs a bit tricky. On memcg offlining the
> memcg kmem caches are reparented and the css->id can get recycled. So,
> we want to know that the a kmem cache is reparented and which memcg it
> belonged to initially. Determining if a kmem cache is reparented, we
> can store a flag on the kmem cache and for the previous memcg we can
> use fhandle. However to not make this more complicated, for now, we
> can just have the info that the kmem cache was reparented i.e. belongs
> to an offlined memcg.

I need to play with Roman's kmem cache reparenting patch a bit more to
see how to properly recognize a reparent'ed kmem cache. What I have
noticed is that the dead kmem caches that I saw at boot up were gone
after applying his patch. So that is a good thing.

For now, I think the current patch is good enough for its purpose. I may
send follow-up if I see something that can be improved.

Cheers,
Longman

next prev parent reply	other threads:[~2019-06-20 14:24 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-06-19 17:16 Waiman Long
2019-06-19 23:48 ` Shakeel Butt
2019-06-20 14:23   ` Waiman Long [this message]
2019-06-20 14:39     ` Shakeel Butt
2019-06-20 14:48       ` Waiman Long

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=cfc6c800-1cb4-e2f2-e6d9-f0571c11a47b@redhat.com \
    --to=longman@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=cl@linux.com \
    --cc=guro@fb.com \
    --cc=hannes@cmpxchg.org \
    --cc=iamjoonsoo.kim@lge.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=penberg@kernel.org \
    --cc=rientjes@google.com \
    --cc=shakeelb@google.com \
    --cc=vdavydov.dev@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox