linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Michal Hocko <mhocko@kernel.org>
To: Roman Gushchin <guro@fb.com>
Cc: linux-mm@kvack.org, Johannes Weiner <hannes@cmpxchg.org>,
	David Rientjes <rientjes@google.com>,
	Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>,
	Tejun Heo <tj@kernel.org>,
	kernel-team@fb.com, linux-kernel@vger.kernel.org
Subject: Re: [PATCH 3/3] mm, oom: introduce memory.oom.group
Date: Wed, 1 Aug 2018 07:55:03 +0200	[thread overview]
Message-ID: <20180801055503.GB16767@dhcp22.suse.cz> (raw)
In-Reply-To: <20180801011447.GB25953@castle.DHCP.thefacebook.com>

On Tue 31-07-18 18:14:48, Roman Gushchin wrote:
> On Tue, Jul 31, 2018 at 11:07:00AM +0200, Michal Hocko wrote:
> > On Mon 30-07-18 11:01:00, Roman Gushchin wrote:
> > > For some workloads an intervention from the OOM killer
> > > can be painful. Killing a random task can bring
> > > the workload into an inconsistent state.
> > > 
> > > Historically, there are two common solutions for this
> > > problem:
> > > 1) enabling panic_on_oom,
> > > 2) using a userspace daemon to monitor OOMs and kill
> > >    all outstanding processes.
> > > 
> > > Both approaches have their downsides:
> > > rebooting on each OOM is an obvious waste of capacity,
> > > and handling all in userspace is tricky and requires
> > > a userspace agent, which will monitor all cgroups
> > > for OOMs.
> > > 
> > > In most cases an in-kernel after-OOM cleaning-up
> > > mechanism can eliminate the necessity of enabling
> > > panic_on_oom. Also, it can simplify the cgroup
> > > management for userspace applications.
> > > 
> > > This commit introduces a new knob for cgroup v2 memory
> > > controller: memory.oom.group. The knob determines
> > > whether the cgroup should be treated as a single
> > > unit by the OOM killer. If set, the cgroup and its
> > > descendants are killed together or not at all.
> > 
> > I do not want to nit pick on wording but unit is not really a good
> > description. I would expect that to mean that the oom killer will
> > consider the unit also when selecting the task and that is not the case.
> > I would be more explicit about this being a single killable entity
> > because it forms an indivisible workload.
> > 
> > You can reuse http://lkml.kernel.org/r/20180730080357.GA24267@dhcp22.suse.cz
> > if you want.
> 
> Ok, I'll do my best to make it clearer.
> 
> > 
> > [...]
> > > +/**
> > > + * mem_cgroup_get_oom_group - get a memory cgroup to clean up after OOM
> > > + * @victim: task to be killed by the OOM killer
> > > + * @oom_domain: memcg in case of memcg OOM, NULL in case of system-wide OOM
> > > + *
> > > + * Returns a pointer to a memory cgroup, which has to be cleaned up
> > > + * by killing all belonging OOM-killable tasks.
> > 
> > Caller has to call mem_cgroup_put on the returned non-null memcg.
> 
> Added.
> 
> > 
> > > + */
> > > +struct mem_cgroup *mem_cgroup_get_oom_group(struct task_struct *victim,
> > > +					    struct mem_cgroup *oom_domain)
> > > +{
> > > +	struct mem_cgroup *oom_group = NULL;
> > > +	struct mem_cgroup *memcg;
> > > +
> > > +	if (!cgroup_subsys_on_dfl(memory_cgrp_subsys))
> > > +		return NULL;
> > > +
> > > +	if (!oom_domain)
> > > +		oom_domain = root_mem_cgroup;
> > > +
> > > +	rcu_read_lock();
> > > +
> > > +	memcg = mem_cgroup_from_task(victim);
> > > +	if (!memcg || memcg == root_mem_cgroup)
> > > +		goto out;
> > 
> > When can we have memcg == NULL? victim should be always non-NULL.
> > Also why do you need to special case the root_mem_cgroup here. The loop
> > below should handle that just fine no?
> 
> Idk, I prefer to keep an explicit root_mem_cgroup check,
> rather than traversing the tree and relying on an inability
> to set oom_group on the root.

I will not insist but this just makes the code harder to read.

[...]
> > > +	if (oom_group) {
> > 
> > we want a printk explaining that we are going to tear down the whole
> > oom_group here.
> 
> Does this looks good?
> Or it's better to remove "memory." prefix?
> 
> [   52.835327] Out of memory: Kill process 1221 (allocate) score 241 or sacrifice child
> [   52.836625] Killed process 1221 (allocate) total-vm:2257144kB, anon-rss:2009128kB, file-rss:4kB, shmem-rss:0kB
> [   52.841431] Tasks in /A1 are going to be killed due to memory.oom.group set

Yes, looks good to me.

> [   52.869439] Killed process 1217 (allocate) total-vm:2052344kB, anon-rss:1704036kB, file-rss:0kB, shmem-rss:0kB
> [   52.875601] Killed process 1218 (allocate) total-vm:106668kB, anon-rss:24668kB, file-rss:0kB, shmem-rss:0kB
> [   52.882914] Killed process 1219 (allocate) total-vm:106668kB, anon-rss:21528kB, file-rss:0kB, shmem-rss:0kB
> [   52.891806] Killed process 1220 (allocate) total-vm:2257144kB, anon-rss:1984120kB, file-rss:4kB, shmem-rss:0kB
> [   52.903770] Killed process 1221 (allocate) total-vm:2257144kB, anon-rss:2009128kB, file-rss:4kB, shmem-rss:0kB
> [   52.905574] Killed process 1222 (allocate) total-vm:2257144kB, anon-rss:2063640kB, file-rss:0kB, shmem-rss:0kB
> [   53.202153] oom_reaper: reaped process 1222 (allocate), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
> 
> > 
> > > +		mem_cgroup_scan_tasks(oom_group, oom_kill_memcg_member, NULL);
> > > +		mem_cgroup_put(oom_group);
> > > +	}
> > >  }
> > 
> > Other than that looks good to me. My concern that the previous
> > implementation was more consistent because we were comparing memcgs
> > still holds but if there is no way forward that direction this should be
> > acceptable as well.
> > 
> > After above small things are addressed you can add
> > Acked-by: Michal Hocko <mhocko@suse.com> 
> 
> Thank you!

-- 
Michal Hocko
SUSE Labs

  reply	other threads:[~2018-08-01  5:55 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-07-30 18:00 [PATCH 0/3] " Roman Gushchin
2018-07-30 18:00 ` [PATCH 1/3] mm: introduce mem_cgroup_put() helper Roman Gushchin
2018-07-31  8:45   ` Michal Hocko
2018-07-31 14:58     ` Shakeel Butt
2018-08-01  5:53       ` Michal Hocko
2018-08-01 17:31   ` Johannes Weiner
2018-07-30 18:00 ` [PATCH 2/3] mm, oom: refactor oom_kill_process() Roman Gushchin
2018-08-01 17:32   ` Johannes Weiner
2018-07-30 18:01 ` [PATCH 3/3] mm, oom: introduce memory.oom.group Roman Gushchin
2018-07-31  9:07   ` Michal Hocko
2018-08-01  1:14     ` Roman Gushchin
2018-08-01  5:55       ` Michal Hocko [this message]
2018-08-01 17:48         ` Johannes Weiner
2018-08-01 17:50   ` Johannes Weiner
2018-07-31  1:49 ` [PATCH 0/3] " David Rientjes
2018-07-31 15:54   ` Johannes Weiner
2018-07-31 23:51   ` Roman Gushchin
2018-08-01 21:51     ` David Rientjes
2018-08-01 22:47       ` Roman Gushchin
2018-08-06 21:34         ` David Rientjes
2018-08-07  0:30           ` Roman Gushchin
2018-08-07 22:34             ` David Rientjes
2018-08-08 10:59               ` Michal Hocko
2018-08-09 20:10                 ` David Rientjes
2018-08-10  7:03                   ` Michal Hocko
2018-08-19 23:26               ` cgroup aware oom killer (was Re: [PATCH 0/3] introduce memory.oom.group) David Rientjes
2018-08-20 19:05                 ` Roman Gushchin
2018-08-02  8:00       ` [PATCH 0/3] introduce memory.oom.group Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180801055503.GB16767@dhcp22.suse.cz \
    --to=mhocko@kernel.org \
    --cc=guro@fb.com \
    --cc=hannes@cmpxchg.org \
    --cc=kernel-team@fb.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=penguin-kernel@I-love.SAKURA.ne.jp \
    --cc=rientjes@google.com \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox