linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Johannes Weiner <hannes@cmpxchg.org>
To: Tejun Heo <tj@kernel.org>
Cc: Michal Hocko <mhocko@suse.cz>,
	bsingharora@gmail.com, cgroups@vger.kernel.org,
	linux-mm@kvack.org, lizefan@huawei.com
Subject: Re: [PATCH 3/3] memcg: simplify mem_cgroup_reclaim_iter
Date: Wed, 5 Jun 2013 10:39:49 -0400	[thread overview]
Message-ID: <20130605143949.GQ15576@cmpxchg.org> (raw)
In-Reply-To: <20130605082023.GG7303@mtj.dyndns.org>

On Wed, Jun 05, 2013 at 01:20:23AM -0700, Tejun Heo wrote:
> Hello, Michal.
> 
> On Wed, Jun 05, 2013 at 09:30:23AM +0200, Michal Hocko wrote:
> > > I don't really get that.  As long as the amount is bound and the
> > > overhead negligible / acceptable, why does it matter how long the
> > > pinning persists? 
> > 
> > Because the amount is not bound either. Just create a hierarchy and
> > trigger the hard limit and if you are careful enough you can always keep
> > some of the children in the cached pointer (with css reference, if you
> > will) and then release the hierarchy. You can do that repeatedly and
> > leak considerable amount of memory.
> 
> It's still bound, no?  Each live memcg can only keep limited number of
> cgroups cached, right?

It is bounded by the number of memcgs.  Each one can have 12
(DEF_PRIORITY) references.

> > > We aren't talking about something gigantic or can
> > 
> > mem_cgroup is 888B now (depending on configuration). So I wouldn't call
> > it negligible.
> 
> Do you think that the number can actually grow harmful?  Would you be
> kind enough to share some calculations with me?

5k cgroups * say 10 priority levels * 1k struct mem_cgroup may pin 51M
of dead struct mem_cgroup, plus whatever else the css pins.

> > > In the off chance that this is a real problem, which I strongly doubt,
> > > as I wrote to Johannes, we can implement extremely dumb cleanup
> > > routine rather than this weak reference beast.
> > 
> > That was my first version (https://lkml.org/lkml/2013/1/3/298) and
> > Johannes didn't like. To be honest I do not care _much_ which way we go
> > but we definitely cannot pin those objects for ever.
> 
> I'll get to the barrier thread but really complex barrier dancing like
> that is only justifiable in extremely hot paths a lot of people pay
> attention to.  It doesn't belong inside memcg proper.  If the cached
> amount is an actual concern, let's please implement a simple clean up
> thing.  All we need is a single delayed_work which scans the tree
> periodically.
> 
> Johannes, what do you think?

While I see your concerns about complexity (and this certainly is not
the most straight-forward code), I really can't get too excited about
asynchroneous garbage collection, even worse when it's time-based. It
would probably start out with less code but two releases later we
would have added all this stuff that's required to get the interaction
right and fix unpredictable reclaim disruption that hits when the
reaper coincides just right with heavy reclaim once a week etc.  I
just don't think that asynchroneous models are simpler than state
machines.  Harder to reason about, harder to debug.

Now, there are separate things that add complexity to our current
code: the weak pointers, the lockless iterator, and the fact that all
of it is jam-packed into one monolithic iterator function.  I can see
why you are not happy.  But that does not mean we have to get rid of
everything wholesale.

You hate the barriers, so let's add a lock to access the iterator.
That path is not too hot in most cases.

On the other hand, the weak pointer is not too esoteric of a pattern
and we can neatly abstract it into two functions: one that takes an
iterator and returns a verified css reference or NULL, and one to
invalidate pointers when called from the memcg destruction code.

These two things should greatly simplify mem_cgroup_iter() while not
completely abandoning all our optimizations.

What do you think?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  parent reply	other threads:[~2013-06-05 14:40 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-06-04  0:44 [PATCHSET] memcg: fix and reimplement iterator Tejun Heo
2013-06-04  0:44 ` [PATCH 1/3] memcg: fix subtle memory barrier bug in mem_cgroup_iter() Tejun Heo
2013-06-04 13:03   ` Michal Hocko
2013-06-04 13:58     ` Johannes Weiner
2013-06-04 15:29       ` Michal Hocko
2013-06-04  0:44 ` [PATCH 2/3] memcg: restructure mem_cgroup_iter() Tejun Heo
2013-06-04 13:21   ` Michal Hocko
2013-06-04 20:51     ` Tejun Heo
2013-06-04  0:44 ` [PATCH 3/3] memcg: simplify mem_cgroup_reclaim_iter Tejun Heo
2013-06-04 13:18   ` Michal Hocko
2013-06-04 20:50     ` Tejun Heo
2013-06-04 21:28       ` Michal Hocko
2013-06-04 21:55         ` Tejun Heo
2013-06-05  7:30           ` Michal Hocko
2013-06-05  8:20             ` Tejun Heo
2013-06-05  8:36               ` Michal Hocko
2013-06-05  8:44                 ` Tejun Heo
2013-06-05  8:55                   ` Michal Hocko
2013-06-05  9:03                     ` Tejun Heo
2013-06-05 14:39               ` Johannes Weiner [this message]
2013-06-05 14:50                 ` Johannes Weiner
2013-06-05 14:56                 ` Michal Hocko
2013-06-05 17:22                 ` Tejun Heo
2013-06-05 19:45                   ` Johannes Weiner
2013-06-05 20:06                     ` Tejun Heo
2013-06-05 21:17                       ` Johannes Weiner
2013-06-05 22:20                         ` Tejun Heo
2013-06-05 22:27                           ` Tejun Heo
2013-06-06 11:50                             ` Michal Hocko
2013-06-07  0:52                               ` Tejun Heo
2013-06-07  7:37                                 ` Michal Hocko
2013-06-07 23:25                                   ` Tejun Heo
2013-06-10  8:02                                     ` Michal Hocko
2013-06-10 19:54                                       ` Tejun Heo
2013-06-10 20:48                                         ` Michal Hocko
2013-06-10 23:13                                           ` Tejun Heo
2013-06-11  7:27                                             ` Michal Hocko
2013-06-11  7:44                                               ` Tejun Heo
2013-06-11  7:55                                                 ` Michal Hocko
2013-06-11  8:00                                                   ` Tejun Heo
2013-06-04 21:40       ` Johannes Weiner
2013-06-04 21:49         ` Tejun Heo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130605143949.GQ15576@cmpxchg.org \
    --to=hannes@cmpxchg.org \
    --cc=bsingharora@gmail.com \
    --cc=cgroups@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lizefan@huawei.com \
    --cc=mhocko@suse.cz \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox