From: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
To: Johannes Weiner <hannes@cmpxchg.org>,
Vladimir Davydov <vdavydov@parallels.com>
Cc: Michal Hocko <mhocko@suse.cz>, Greg Thelen <gthelen@google.com>,
Hugh Dickins <hughd@google.com>,
Motohiro Kosaki <Motohiro.Kosaki@us.fujitsu.com>,
Glauber Costa <glommer@gmail.com>, Tejun Heo <tj@kernel.org>,
Andrew Morton <akpm@linux-foundation.org>,
Pavel Emelianov <xemul@parallels.com>,
Konstantin Khorenko <khorenko@parallels.com>,
LKML-MM <linux-mm@kvack.org>,
LKML-cgroups <cgroups@vger.kernel.org>,
LKML <linux-kernel@vger.kernel.org>
Subject: Re: [RFC] memory cgroup: my thoughts on memsw
Date: Tue, 16 Sep 2014 10:34:55 +0900 [thread overview]
Message-ID: <541793BF.7070106@jp.fujitsu.com> (raw)
In-Reply-To: <20140915191435.GA8950@cmpxchg.org>
(2014/09/16 4:14), Johannes Weiner wrote:
> Hi Vladimir,
>
> On Thu, Sep 04, 2014 at 06:30:55PM +0400, Vladimir Davydov wrote:
>> To sum it up, the current mem + memsw configuration scheme doesn't allow
>> us to limit swap usage if we want to partition the system dynamically
>> using soft limits. Actually, it also looks rather confusing to me. We
>> have mem limit and mem+swap limit. I bet that from the first glance, an
>> average admin will think it's possible to limit swap usage by setting
>> the limits so that the difference between memory.memsw.limit and
>> memory.limit equals the maximal swap usage, but (surprise!) it isn't
>> really so. It holds if there's no global memory pressure, but otherwise
>> swap usage is only limited by memory.memsw.limit! IMHO, it isn't
>> something obvious.
>
> Agreed, memory+swap accounting & limiting is broken.
>
>> - Anon memory is handled by the user application, while file caches are
>> all on the kernel. That means the application will *definitely* die
>> w/o anon memory. W/o file caches it usually can survive, but the more
>> caches it has the better it feels.
>>
>> - Anon memory is not that easy to reclaim. Swap out is a really slow
>> process, because data are usually read/written w/o any specific
>> order. Dropping file caches is much easier. Typically we have lots of
>> clean pages there.
>>
>> - Swap space is limited. And today, it's OK to have TBs of RAM and only
>> several GBs of swap. Customers simply don't want to waste their disk
>> space on that.
>
>> Finally, my understanding (may be crazy!) how the things should be
>> configured. Just like now, there should be mem_cgroup->res accounting
>> and limiting total user memory (cache+anon) usage for processes inside
>> cgroups. This is where there's nothing to do. However, mem_cgroup->memsw
>> should be reworked to account *only* memory that may be swapped out plus
>> memory that has been swapped out (i.e. swap usage).
>
> But anon pages are not a resource, they are a swap space liability.
> Think of virtual memory vs. physical pages - the use of one does not
> necessarily result in the use of the other. Without memory pressure,
> anonymous pages do not consume swap space.
>
> What we *should* be accounting and limiting here is the actual finite
> resource: swap space. Whenever we try to swap a page, its owner
> should be charged for the swap space - or the swapout be rejected.
>
> For hard limit reclaim, the semantics of a swap space limit would be
> fairly obvious, because it's clear who the offender is.
>
> However, in an overcommitted machine, the amount of swap space used by
> a particular group depends just as much on the behavior of the other
> groups in the system, so the per-group swap limit should be enforced
> even during global reclaim to feed back pressure on whoever is causing
> the swapout. If reclaim fails, the global OOM killer triggers, which
> should then off the group with the biggest soft limit excess.
>
> As far as implementation goes, it should be doable to try-charge from
> add_to_swap() and keep the uncharging in swap_entry_free().
>
> We'll also have to extend the global OOM killer to be memcg-aware, but
> we've been meaning to do that anyway.
>
When we introduced memsw limitation, we tried to avoid affecting global memory reclaim.
Then, we did memory+swap limitation.
Now, global memory reclaim is memcg-aware. So, I think swap-limitation rather than
anon+swap may be a choice. The change will reduce res_counter access. Hmm, it will be
desireble to move anon pages to Unevictable if memcg's swap slot is 0.
Anyway, I think softlimit should be re-implemented, 1st. It will be starting point.
Thanks,
-Kame
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2014-09-16 1:36 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-09-04 14:30 Vladimir Davydov
2014-09-04 22:03 ` Kamezawa Hiroyuki
2014-09-05 8:28 ` Vladimir Davydov
2014-09-05 14:20 ` Kamezawa Hiroyuki
2014-09-05 16:00 ` Vladimir Davydov
2014-09-05 23:15 ` Kamezawa Hiroyuki
2014-09-08 11:01 ` Vladimir Davydov
2014-09-08 13:53 ` Kamezawa Hiroyuki
2014-09-09 10:39 ` Vladimir Davydov
2014-09-11 2:04 ` Kamezawa Hiroyuki
2014-09-11 8:23 ` Vladimir Davydov
2014-09-11 8:53 ` Kamezawa Hiroyuki
2014-09-11 9:50 ` Vladimir Davydov
2014-09-10 12:01 ` Vladimir Davydov
2014-09-11 1:22 ` Kamezawa Hiroyuki
2014-09-11 7:03 ` Vladimir Davydov
2014-09-15 19:14 ` Johannes Weiner
2014-09-16 1:34 ` Kamezawa Hiroyuki [this message]
2014-09-17 15:59 ` Vladimir Davydov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=541793BF.7070106@jp.fujitsu.com \
--to=kamezawa.hiroyu@jp.fujitsu.com \
--cc=Motohiro.Kosaki@us.fujitsu.com \
--cc=akpm@linux-foundation.org \
--cc=cgroups@vger.kernel.org \
--cc=glommer@gmail.com \
--cc=gthelen@google.com \
--cc=hannes@cmpxchg.org \
--cc=hughd@google.com \
--cc=khorenko@parallels.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@suse.cz \
--cc=tj@kernel.org \
--cc=vdavydov@parallels.com \
--cc=xemul@parallels.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox