[LSF/MM TOPIC] dying memory cgroups and slab reclaim issues

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Roman Gushchin <guro@fb.com>
To: "sf-pc@lists.linux-foundation.org" <sf-pc@lists.linux-foundation.org>
Cc: "linux-mm@kvack.org" <linux-mm@kvack.org>,
	"mhocko@kernel.org" <mhocko@kernel.org>,
	"riel@surriel.com" <riel@surriel.com>,
	"dchinner@redhat.com" <dchinner@redhat.com>,
	"dairinin@gmail.com" <dairinin@gmail.com>,
	"akpm@linux-foundation.org" <akpm@linux-foundation.org>
Subject: [LSF/MM TOPIC] dying memory cgroups and slab reclaim issues
Date: Mon, 18 Feb 2019 23:53:20 +0000	[thread overview]
Message-ID: <20190218235313.GA4627@castle.DHCP.thefacebook.com> (raw)

Recent reverts of memcg leak fixes [1, 2] reintroduced the problem
with accumulating of dying memory cgroups. This is a serious problem:
on most of our machines we've seen thousands on dying cgroups, and
the corresponding memory footprint was measured in hundreds of megabytes.
The problem was also independently discovered by other companies.

The fixes were reverted due to xfs regression investigated by Dave Chinner.
Simultaneously we've seen a very small (0.18%) cpu regression on some hosts,
which caused Rik van Riel to propose a patch [3], which aimed to fix the
regression. The idea is to accumulate small memory pressure and apply it
periodically, so that we don't overscan small shrinker lists. According
to Jan Kara's data [4], Rik's patch partially fixed the regression,
but not entirely.

The path forward isn't entirely clear now, and the status quo isn't acceptable
sue to memcg leak bug. Dave and Michal's position is to focus on dying memory
cgroup case and apply some artificial memory pressure on corresponding slabs
(probably, during cgroup deletion process). This approach can theoretically
be less harmful for the subtle scanning balance, and not cause any regressions.

In my opinion, it's not necessarily true. Slab objects can be shared between
cgroups, and often can't be reclaimed on cgroup removal without an impact on the
rest of the system. Applying constant artificial memory pressure precisely only
on objects accounted to dying cgroups is challenging and will likely
cause a quite significant overhead. Also, by "forgetting" of some slab objects
under light or even moderate memory pressure, we're wasting memory, which can be
used for something useful. Dying cgroups are just making this problem more
obvious because of their size.

So, using "natural" memory pressure in a way, that all slabs objects are scanned
periodically, seems to me as the best solution. The devil is in details, and how
to do it without causing any regressions, is an open question now.

Also, completely re-parenting slabs to parent cgroup (not only shrinker lists)
is a potential option to consider.

It will be nice to discuss the problem on LSF/MM, agree on general path and
make a potential list of benchmarks, which can be used to prove the solution.

[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=a9a238e83fbb0df31c3b9b67003f8f9d1d1b6c96
[2] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=69056ee6a8a3d576ed31e38b3b14c70d6c74edcc
[3] https://lkml.org/lkml/2019/1/28/1865
[4] https://lkml.org/lkml/2019/2/8/336

next             reply	other threads:[~2019-02-18 23:53 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-02-18 23:53 Roman Gushchin [this message]
2019-02-19  0:31 Roman Gushchin
2019-02-19  2:04 ` Dave Chinner
2019-02-19 17:31   ` Rik van Riel
2019-02-19 17:38     ` Michal Hocko
2019-02-19 23:26     ` Dave Chinner
2019-02-20  2:06       ` Rik van Riel
2019-02-20  4:33         ` Dave Chinner
2019-02-20  5:31           ` Roman Gushchin
2019-02-20 17:00           ` Rik van Riel
2019-02-19  7:13 Roman Gushchin
2019-02-20  2:47 ` Dave Chinner
2019-02-20  5:50   ` Dave Chinner
2019-02-20  7:27     ` Dave Chinner
2019-02-20 16:20       ` Johannes Weiner
2019-02-21 22:46       ` Roman Gushchin
2019-02-22  1:48         ` Rik van Riel
2019-02-22  1:57           ` Roman Gushchin
2019-02-28 20:30         ` Roman Gushchin
2019-02-28 21:30           ` Dave Chinner
2019-02-28 22:29             ` Roman Gushchin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190218235313.GA4627@castle.DHCP.thefacebook.com \
    --to=guro@fb.com \
    --cc=akpm@linux-foundation.org \
    --cc=dairinin@gmail.com \
    --cc=dchinner@redhat.com \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=riel@surriel.com \
    --cc=sf-pc@lists.linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox