From: Michal Hocko <mhocko@suse.cz>
To: Johannes Weiner <hannes@cmpxchg.org>
Cc: Vladimir Davydov <vdavydov@parallels.com>,
Andrew Morton <akpm@linux-foundation.org>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH] memcg: remove mem_cgroup_reclaimable check from soft reclaim
Date: Wed, 22 Oct 2014 15:51:31 +0200 [thread overview]
Message-ID: <20141022135131.GB30802@dhcp22.suse.cz> (raw)
In-Reply-To: <20141022124025.GA17161@phnom.home.cmpxchg.org>
On Wed 22-10-14 08:40:25, Johannes Weiner wrote:
> On Wed, Oct 22, 2014 at 01:21:16PM +0200, Michal Hocko wrote:
> > On Tue 21-10-14 14:22:39, Johannes Weiner wrote:
> > [...]
> > > From 27bd24b00433d9f6c8d60ba2b13dbff158b06c13 Mon Sep 17 00:00:00 2001
> > > From: Johannes Weiner <hannes@cmpxchg.org>
> > > Date: Tue, 21 Oct 2014 09:53:54 -0400
> > > Subject: [patch] mm: memcontrol: do not filter reclaimable nodes in NUMA
> > > round-robin
> > >
> > > The round-robin node reclaim currently tries to include only nodes
> > > that have memory of the memcg in question, which is quite elaborate.
> > >
> > > Just use plain round-robin over the nodes that are allowed by the
> > > task's cpuset, which are the most likely to contain that memcg's
> > > memory. But even if zones without memcg memory are encountered,
> > > direct reclaim will skip over them without too much hassle.
> >
> > I do not think that using the current's node mask is correct. Different
> > tasks in the same memcg might be bound to different nodes and then a set
> > of nodes might be reclaimed much more if a particular task hits limit
> > more often. It also doesn't make much sense from semantical POV, we are
> > reclaiming memcg so the mask should be union of all tasks allowed nodes.
>
> Unless the cpuset hierarchy is separate from the memcg hierarchy, all
> tasks in the memcg belong to the same cpuset. And the whole point of
> cpusets is that a group of tasks has the same nodemask, no?
Memory limit and memory placement are orthogonal configurations and they
might be stacked one on top of other in both directions.
> Sure, there are *possible* configurations for which this assumption
> breaks, like multiple hierarchies, but are they sensible? Do we care?
Why wouldn't they be sensible? What is wrong about limiting memory of
a load which internally uses node placement for its components?
> > What we do currently is overly complicated though and I agree that there
> > is no good reason for it.
> > Let's just s@cpuset_current_mems_allowed@node_online_map@ and round
> > robin over all nodes. As you said we do not have to optimize for empty
> > zones.
>
> That was what I first had. And cpuset_current_mems_allowed defaults
> to node_online_map, but once the user sets up cpusets in conjunction
> with memcgs, it seems to be the preferred value.
>
> The other end of this is that if you have 16 nodes and use cpuset to
> bind the task to node 14 and 15, round-robin iterations of node 1-13
> will reclaim the group's memory on 14 and only the 15 iteration will
> actually look at memory from node 15 first.
mem_cgroup_select_victim_node can check reclaimability of the memcg
(hierarchy) and skip nodes without pages. Or would that be too
expensive? We are in the slow path already.
> It seems using the cpuset bindings, while theoretically independent,
> would do the right thing for all intents and purposes.
Only if cpuset is on top of memcg. Not the other way around as mentioned
above (possible node over-reclaim).
--
Michal Hocko
SUSE Labs
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
prev parent reply other threads:[~2014-10-22 13:51 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-10-21 13:15 Vladimir Davydov
2014-10-21 15:37 ` Michal Hocko
2014-10-21 18:22 ` Johannes Weiner
2014-10-22 6:40 ` Vladimir Davydov
2014-10-22 11:21 ` Michal Hocko
2014-10-22 12:40 ` Johannes Weiner
2014-10-22 13:51 ` Michal Hocko [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20141022135131.GB30802@dhcp22.suse.cz \
--to=mhocko@suse.cz \
--cc=akpm@linux-foundation.org \
--cc=hannes@cmpxchg.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=vdavydov@parallels.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox