From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.3 required=3.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH, MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_SANE_1 autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 898D6C432C3 for ; Wed, 27 Nov 2019 11:11:33 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 3A9B32053B for ; Wed, 27 Nov 2019 11:11:33 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="TI5fnsJE" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 3A9B32053B Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id C36286B0383; Wed, 27 Nov 2019 06:11:32 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id BE8E26B0384; Wed, 27 Nov 2019 06:11:32 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id AD6906B0385; Wed, 27 Nov 2019 06:11:32 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0134.hostedemail.com [216.40.44.134]) by kanga.kvack.org (Postfix) with ESMTP id 9747A6B0383 for ; Wed, 27 Nov 2019 06:11:32 -0500 (EST) Received: from smtpin12.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with SMTP id 56435180AD817 for ; Wed, 27 Nov 2019 11:11:32 +0000 (UTC) X-FDA: 76201791624.12.self78_d51c21556038 X-HE-Tag: self78_d51c21556038 X-Filterd-Recvd-Size: 10354 Received: from us-smtp-delivery-1.mimecast.com (us-smtp-1.mimecast.com [205.139.110.61]) by imf30.hostedemail.com (Postfix) with ESMTP for ; Wed, 27 Nov 2019 11:11:31 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1574853090; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:autocrypt:autocrypt; bh=zU3LP0oNrqn97jakrt/9BwP2QthkhIpPqCX8bWDfrLg=; b=TI5fnsJEimY1YAAJZKjYx8YeEK7VCl3Xh2+X5kPUaaXmXWR09o935gwoQmHUlbmTFTYBaV arYxWdDu+eGrSCIvIuwPVKkXmw0yFcJaovHFB8pdbNUpzW85h3xX6mn09ep0mAX5D/Ey76 UKDu3DU8X0zNRgZVS6ZSlx2IB1TwtvM= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-384-nTWD5rSjPeqKBRmjKkupDQ-1; Wed, 27 Nov 2019 06:11:29 -0500 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 85D0480183C; Wed, 27 Nov 2019 11:11:27 +0000 (UTC) Received: from [10.36.118.129] (unknown [10.36.118.129]) by smtp.corp.redhat.com (Postfix) with ESMTP id AB4A260BEC; Wed, 27 Nov 2019 11:11:25 +0000 (UTC) Subject: Re: [PATCH v2] mm, memcg: avoid oom if cgroup is not populated To: Yafang Shao , mhocko@kernel.org, hannes@cmpxchg.org, vdavydov.dev@gmail.com, akpm@linux-foundation.org Cc: linux-mm@kvack.org, Michal Hocko References: <1574818117-2885-1-git-send-email-laoar.shao@gmail.com> From: David Hildenbrand Autocrypt: addr=david@redhat.com; prefer-encrypt=mutual; keydata= mQINBFXLn5EBEAC+zYvAFJxCBY9Tr1xZgcESmxVNI/0ffzE/ZQOiHJl6mGkmA1R7/uUpiCjJ dBrn+lhhOYjjNefFQou6478faXE6o2AhmebqT4KiQoUQFV4R7y1KMEKoSyy8hQaK1umALTdL QZLQMzNE74ap+GDK0wnacPQFpcG1AE9RMq3aeErY5tujekBS32jfC/7AnH7I0v1v1TbbK3Gp XNeiN4QroO+5qaSr0ID2sz5jtBLRb15RMre27E1ImpaIv2Jw8NJgW0k/D1RyKCwaTsgRdwuK Kx/Y91XuSBdz0uOyU/S8kM1+ag0wvsGlpBVxRR/xw/E8M7TEwuCZQArqqTCmkG6HGcXFT0V9 PXFNNgV5jXMQRwU0O/ztJIQqsE5LsUomE//bLwzj9IVsaQpKDqW6TAPjcdBDPLHvriq7kGjt WhVhdl0qEYB8lkBEU7V2Yb+SYhmhpDrti9Fq1EsmhiHSkxJcGREoMK/63r9WLZYI3+4W2rAc UucZa4OT27U5ZISjNg3Ev0rxU5UH2/pT4wJCfxwocmqaRr6UYmrtZmND89X0KigoFD/XSeVv jwBRNjPAubK9/k5NoRrYqztM9W6sJqrH8+UWZ1Idd/DdmogJh0gNC0+N42Za9yBRURfIdKSb B3JfpUqcWwE7vUaYrHG1nw54pLUoPG6sAA7Mehl3nd4pZUALHwARAQABtCREYXZpZCBIaWxk ZW5icmFuZCA8ZGF2aWRAcmVkaGF0LmNvbT6JAj4EEwECACgFAljj9eoCGwMFCQlmAYAGCwkI BwMCBhUIAgkKCwQWAgMBAh4BAheAAAoJEE3eEPcA/4Na5IIP/3T/FIQMxIfNzZshIq687qgG 8UbspuE/YSUDdv7r5szYTK6KPTlqN8NAcSfheywbuYD9A4ZeSBWD3/NAVUdrCaRP2IvFyELj xoMvfJccbq45BxzgEspg/bVahNbyuBpLBVjVWwRtFCUEXkyazksSv8pdTMAs9IucChvFmmq3 jJ2vlaz9lYt/lxN246fIVceckPMiUveimngvXZw21VOAhfQ+/sofXF8JCFv2mFcBDoa7eYob s0FLpmqFaeNRHAlzMWgSsP80qx5nWWEvRLdKWi533N2vC/EyunN3HcBwVrXH4hxRBMco3jvM m8VKLKao9wKj82qSivUnkPIwsAGNPdFoPbgghCQiBjBe6A75Z2xHFrzo7t1jg7nQfIyNC7ez MZBJ59sqA9EDMEJPlLNIeJmqslXPjmMFnE7Mby/+335WJYDulsRybN+W5rLT5aMvhC6x6POK z55fMNKrMASCzBJum2Fwjf/VnuGRYkhKCqqZ8gJ3OvmR50tInDV2jZ1DQgc3i550T5JDpToh dPBxZocIhzg+MBSRDXcJmHOx/7nQm3iQ6iLuwmXsRC6f5FbFefk9EjuTKcLMvBsEx+2DEx0E UnmJ4hVg7u1PQ+2Oy+Lh/opK/BDiqlQ8Pz2jiXv5xkECvr/3Sv59hlOCZMOaiLTTjtOIU7Tq 7ut6OL64oAq+uQINBFXLn5EBEADn1959INH2cwYJv0tsxf5MUCghCj/CA/lc/LMthqQ773ga uB9mN+F1rE9cyyXb6jyOGn+GUjMbnq1o121Vm0+neKHUCBtHyseBfDXHA6m4B3mUTWo13nid 0e4AM71r0DS8+KYh6zvweLX/LL5kQS9GQeT+QNroXcC1NzWbitts6TZ+IrPOwT1hfB4WNC+X 2n4AzDqp3+ILiVST2DT4VBc11Gz6jijpC/KI5Al8ZDhRwG47LUiuQmt3yqrmN63V9wzaPhC+ xbwIsNZlLUvuRnmBPkTJwwrFRZvwu5GPHNndBjVpAfaSTOfppyKBTccu2AXJXWAE1Xjh6GOC 8mlFjZwLxWFqdPHR1n2aPVgoiTLk34LR/bXO+e0GpzFXT7enwyvFFFyAS0Nk1q/7EChPcbRb hJqEBpRNZemxmg55zC3GLvgLKd5A09MOM2BrMea+l0FUR+PuTenh2YmnmLRTro6eZ/qYwWkC u8FFIw4pT0OUDMyLgi+GI1aMpVogTZJ70FgV0pUAlpmrzk/bLbRkF3TwgucpyPtcpmQtTkWS gDS50QG9DR/1As3LLLcNkwJBZzBG6PWbvcOyrwMQUF1nl4SSPV0LLH63+BrrHasfJzxKXzqg rW28CTAE2x8qi7e/6M/+XXhrsMYG+uaViM7n2je3qKe7ofum3s4vq7oFCPsOgwARAQABiQIl BBgBAgAPBQJVy5+RAhsMBQkJZgGAAAoJEE3eEPcA/4NagOsP/jPoIBb/iXVbM+fmSHOjEshl KMwEl/m5iLj3iHnHPVLBUWrXPdS7iQijJA/VLxjnFknhaS60hkUNWexDMxVVP/6lbOrs4bDZ NEWDMktAeqJaFtxackPszlcpRVkAs6Msn9tu8hlvB517pyUgvuD7ZS9gGOMmYwFQDyytpepo YApVV00P0u3AaE0Cj/o71STqGJKZxcVhPaZ+LR+UCBZOyKfEyq+ZN311VpOJZ1IvTExf+S/5 lqnciDtbO3I4Wq0ArLX1gs1q1XlXLaVaA3yVqeC8E7kOchDNinD3hJS4OX0e1gdsx/e6COvy qNg5aL5n0Kl4fcVqM0LdIhsubVs4eiNCa5XMSYpXmVi3HAuFyg9dN+x8thSwI836FoMASwOl C7tHsTjnSGufB+D7F7ZBT61BffNBBIm1KdMxcxqLUVXpBQHHlGkbwI+3Ye+nE6HmZH7IwLwV W+Ajl7oYF+jeKaH4DZFtgLYGLtZ1LDwKPjX7VAsa4Yx7S5+EBAaZGxK510MjIx6SGrZWBrrV TEvdV00F2MnQoeXKzD7O4WFbL55hhyGgfWTHwZ457iN9SgYi1JLPqWkZB0JRXIEtjd4JEQcx +8Umfre0Xt4713VxMygW0PnQt5aSQdMD58jHFxTk092mU+yIHj5LeYgvwSgZN4airXk5yRXl SE+xAvmumFBY Organization: Red Hat GmbH Message-ID: Date: Wed, 27 Nov 2019 12:11:24 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.1.1 MIME-Version: 1.0 In-Reply-To: <1574818117-2885-1-git-send-email-laoar.shao@gmail.com> Content-Language: en-US X-Scanned-By: MIMEDefang 2.79 on 10.5.11.12 X-MC-Unique: nTWD5rSjPeqKBRmjKkupDQ-1 X-Mimecast-Spam-Score: 0 Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 27.11.19 02:28, Yafang Shao wrote: Let me give this patch description an overhaul: > There's one case that the processes in a memcg are all exit (due to OOM > group or some other reasons), but the file page caches are still exist. "When there are no more processes in a memcg (e.g., due to OOM group), we can still have file pages in the page cache." > These file page caches may be protected by memory.min so can't be > reclaimed. If we can't success to restart the processes in this memcg or > don't want to make this memcg offline, then we want to drop the file page > caches. "If these pages are protected by memory.min, they can't be reclaimed. Especially if there won't be another process in this memcg and the memcg is kept online, we do want to drop these pages from the page cache." > The advantage of droping this file caches is it can avoid the reclaimer > (either kswapd or direct) scanning and reclaiming pages from all memcgs > exist in this system, because currently the reclaimer will fairly reclaim > pages from all memcgs if the system is under memory pressure. "By dropping these page caches we can avoid reclaimers (e.g., kswapd or direct) to scan and reclaim pages from all memcgs in the system - because the reclaimers will try to fairly reclaim pages from all memcgs in the system when under memory pressure." > The possible method to drop these file page caches is setting the > hard limit of this memcg to 0. Unfortunately this may invoke the OOM kill= er > and generates lots of outputs, that should not happen. > The OOM output is not expected by the admin if he or she wants to drop > the cahes and knows there're no processes in this memcg. "By setting the hard limit of such a memcg to 0, we allow to drop the page cache of such memcgs. Unfortunately, this may invoke the OOM killer and generate a lot of output. The OOM output is not expected by an admin who wants to drop these caches and knows that there are no processes in this memcg anymore." >=20 > If memcg is not populated, we should not invoke the OOM killer because > there's nothing to kill. Next time when you start a new process and if th= e > max is still bellow usage, the OOM killer will be invoked and your new > process is killed, so we can cosider it as lazy OOM, that is we have been > always doing in the kernel. "Therefore, if a memcg is not populated, we should not invoke the OOM killer - there is nothing to kill. The next time a new process is started in the memcg and the "max" is still below usage, the OOM killer will be invoked and the new process will be killed." 1. I don't think the "lazy OOM" part is relevant. 2. Where is the part that modifies the limits? or did you drop that? is it part of another patch? 3. I think I agree with Michal that modifying the limits smells more like a configuration thingy to be handled by an admin (especially, adapt min/max properly). But again, not sure where that change is located :) 4. This patch on its own (if there are no processes, there is nothing to kill) does not sound too wrong to me. Instead of an endless loop (besides signals) where we can't make any progress, we exit right away. (I am not yet too familiar with memgc, Michal is clearly the expert :) ) >=20 > Fixes: b6e6edcf ("mm: memcontrol: reclaim and OOM kill when shrinking mem= ory.max below usage") > Signed-off-by: Yafang Shao > Cc: Johannes Weiner > Cc: Michal Hocko > --- > mm/memcontrol.c | 15 +++++++++++++-- > 1 file changed, 13 insertions(+), 2 deletions(-) >=20 > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index 1c4c08b..e936f1b 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -6139,9 +6139,20 @@ static ssize_t memory_max_write(struct kernfs_open= _file *of, > =09=09=09continue; > =09=09} > =20 > -=09=09memcg_memory_event(memcg, MEMCG_OOM); > -=09=09if (!mem_cgroup_out_of_memory(memcg, GFP_KERNEL, 0)) > +=09=09/* If there's no procesess, we don't need to invoke the OOM > +=09=09 * killer. Then next time when you try to start a process > +=09=09 * in this memcg, the max may still bellow usage, and then > +=09=09 * this OOM killer will be invoked. This can be considered > +=09=09 * as lazy OOM, that is we have been always doing in the > +=09=09 * kernel. Pls. Michal, that is really consistency. > +=09=09 */ > +=09=09if (cgroup_is_populated(memcg->css.cgroup)) { > +=09=09=09memcg_memory_event(memcg, MEMCG_OOM); > +=09=09=09if (!mem_cgroup_out_of_memory(memcg, GFP_KERNEL, 0)) > +=09=09=09=09break; > +=09=09} else { > =09=09=09break; > +=09=09} > =09} > =20 > =09memcg_wb_domain_size_changed(memcg); >=20 --=20 Thanks, David / dhildenb