From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-lf0-f69.google.com (mail-lf0-f69.google.com [209.85.215.69]) by kanga.kvack.org (Postfix) with ESMTP id 40D796B0279 for ; Tue, 23 May 2017 06:32:21 -0400 (EDT) Received: by mail-lf0-f69.google.com with SMTP id l188so18336909lfg.2 for ; Tue, 23 May 2017 03:32:21 -0700 (PDT) Received: from forwardcorp1o.cmail.yandex.net (forwardcorp1o.cmail.yandex.net. [37.9.109.47]) by mx.google.com with ESMTPS id d137si9437319lfd.200.2017.05.23.03.32.19 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 23 May 2017 03:32:19 -0700 (PDT) Subject: Re: [PATCH] mm/oom_kill: count global and memory cgroup oom kills References: <149520375057.74196.2843113275800730971.stgit@buzz> From: Konstantin Khlebnikov Message-ID: <0f67046d-cdf6-1264-26f6-11c82978c621@yandex-team.ru> Date: Tue, 23 May 2017 13:32:17 +0300 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: David Rientjes Cc: Roman Guschin , linux-mm@kvack.org, Andrew Morton , Tejun Heo , cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, Vlastimil Babka , Michal Hocko , hannes@cmpxchg.org On 23.05.2017 10:49, David Rientjes wrote: > On Mon, 22 May 2017, Konstantin Khlebnikov wrote: > >> Nope, they are different. I think we should rephase documentation somehow >> >> low - count of reclaims below low level >> high - count of post-allocation reclaims above high level >> max - count of direct reclaims >> oom - count of failed direct reclaims >> oom_kill - count of oom killer invocations and killed processes >> > > In our kernel, we've maintained counts of oom kills per memcg for years as > part of memory.oom_control for memcg v1, but we've also found it helpful > to complement that with another count that specifies the number of > processes oom killed that were attached to that exact memcg. > > In your patch, oom_kill in memory.oom_control specifies that number of oom > events that resulted in an oom kill of a process from that hierarchy, but > not the number of processes killed from a specific memcg (the difference > between oc->memcg and mem_cgroup_from_task(victim)). Not sure if you > would also find it helpful. > This is worth addition. Let's call it "oom_victim" for short. It allows to locate leaky part if they are spread over sub-containers within common limit. But doesn't tell which limit caused this kill. For hierarchical limits this might be not so easy. I think oom_kill better suits for automatic actions - restart affected hierarchy, increase limits, e.t.c. But oom_victim allows to determine container affected by global oom killer. So, probably it's worth to merge them together and increment oom_kill by global killer for victim memcg: if (!is_memcg_oom(oc)) { count_vm_event(OOM_KILL); mem_cgroup_count_vm_event(mm, OOM_KILL); } else mem_cgroup_event(oc->memcg, OOM_KILL); -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org