From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-1.0 required=3.0 tests=MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5CEA3C3A5A9 for ; Mon, 4 May 2020 12:46:32 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 1E9DC2075B for ; Mon, 4 May 2020 12:46:32 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 1E9DC2075B Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id B275A8E000C; Mon, 4 May 2020 08:46:31 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id AB0CC8E0003; Mon, 4 May 2020 08:46:31 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9A0438E000C; Mon, 4 May 2020 08:46:31 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0230.hostedemail.com [216.40.44.230]) by kanga.kvack.org (Postfix) with ESMTP id 7B8378E0003 for ; Mon, 4 May 2020 08:46:31 -0400 (EDT) Received: from smtpin19.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 3141A19235 for ; Mon, 4 May 2020 12:46:31 +0000 (UTC) X-FDA: 76779010182.19.band24_38e1f00179002 X-HE-Tag: band24_38e1f00179002 X-Filterd-Recvd-Size: 6104 Received: from mail-wm1-f68.google.com (mail-wm1-f68.google.com [209.85.128.68]) by imf14.hostedemail.com (Postfix) with ESMTP for ; Mon, 4 May 2020 12:46:30 +0000 (UTC) Received: by mail-wm1-f68.google.com with SMTP id k12so8240344wmj.3 for ; Mon, 04 May 2020 05:46:30 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=CL94JfWXS2DvsnyQjnYpeiRnKUHut8ZOn6bOFbR6nfQ=; b=YMvBWNu2xdRCV/Vx+SYrKaVmMoftkLa/nyYJOMb5+W67YX8Y90ZQcNW3yiLrxc+8LD dR9ycmOwwXWYEMiENAFcWiW905kj6pNffAZg3I7QcjJP6VcdNSQuGX2p3R/RlJL538qN ipKb7yLC/VAMDeJd3frmKM+tcsbWi7zD8vD8URTH4irVzKK71ooWdJHJJHjF0hyOlZ1K PtVF1S78rG1+Vil0bUAmVXY6IdeWXlqCqUN3rUd0tGYba9uNFnO8RxX3tYTWkc4Olm0s W67yVpU88/j6Yz5uB0vShKgIuq8lmGzOEe9sIgh3k0Y/SQyCrWvCFevSWDRL18ftWwSM BIjg== X-Gm-Message-State: AGi0PuYbiJGPYABcm0EVydfiCoXdyg1qXKWkiW/MSMB6RrRHLSIFi7J0 lYQwESk8wBY2dxM9rpV/SMA= X-Google-Smtp-Source: APiQypLkX4r7Is7BALt/1sCW3o/tgxGqWvKhYxtzzfRESSZfdC63J4ZA/k1YSYb32kownPj0aoRQCA== X-Received: by 2002:a05:600c:20c6:: with SMTP id y6mr14107328wmm.131.1588596389662; Mon, 04 May 2020 05:46:29 -0700 (PDT) Received: from localhost (ip-37-188-183-9.eurotel.cz. [37.188.183.9]) by smtp.gmail.com with ESMTPSA id a187sm13292545wmh.40.2020.05.04.05.46.28 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 04 May 2020 05:46:28 -0700 (PDT) Date: Mon, 4 May 2020 14:46:27 +0200 From: Michal Hocko To: Yafang Shao Cc: Andrew Morton , Shakeel Butt , Johannes Weiner , Roman Gushchin , Greg Thelen , Linux MM Subject: Re: [PATCH v2 2/2] mm, memcg: don't try to kill a process if memcg is not populated Message-ID: <20200504124627.GP22838@dhcp22.suse.cz> References: <20200504042621.10334-1-laoar.shao@gmail.com> <20200504042621.10334-3-laoar.shao@gmail.com> <20200504081848.GJ22838@dhcp22.suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Mon 04-05-20 20:34:01, Yafang Shao wrote: > On Mon, May 4, 2020 at 4:18 PM Michal Hocko wrote: > > > > [It would be really great if a newer version was posted only after there > > was a wider consensus on the approach.] > > > > On Mon 04-05-20 00:26:21, Yafang Shao wrote: > > > Recently Shakeel reported a issue which also confused me several months > > > earlier. Bellow is his report - > > > Lowering memory.max can trigger an oom-kill if the reclaim does not > > > succeed. However if oom-killer does not find a process for killing, it > > > dumps a lot of warnings. > > > Deleting a memcg does not reclaim memory from it and the memory can > > > linger till there is a memory pressure. One normal way to proactively > > > reclaim such memory is to set memory.max to 0 just before deleting the > > > memcg. However if some of the memcg's memory is pinned by others, this > > > operation can trigger an oom-kill without any process and thus can log a > > > lot of un-needed warnings. So, ignore all such warnings from memory.max. > > > > > > A better way to avoid this issue is to avoid trying to kill a process if > > > memcg is not populated. > > > Note that OOM is different from OOM kill. OOM is a status that the > > > system or memcg is out of memory, while OOM kill is a result that a > > > process inside this memcg is killed when this memcg is in OOM status. > > > > Agreed. > > > > > That is the same reason why there're both MEMCG_OOM event and > > > MEMCG_OOM_KILL event. If we have already known that there's nothing to > > > kill, i.e. the memcg is not populated, then we don't need a try. > > > > OK, but you are not explaining why a silent failure is really better > > than no oom report under oom situation. With your patch, there is > > no failure reported to the user and there is also no sign that there > > might be a problem that memcg leaves memory behind that is not bound to > > any (killable) process. This could be an important information. > > > > That is not a silent failure. An oom event will be reported. > The user can get this event by memory.events or memory.events.local if > he really care about it. You are right. The oom situation will be reported (somehow) but the reason why no task has been killed might be several and there is no way to report no eligible tasks. > Especially when the admin set memory.max to 0 to drop all the caches, > many oom logs are a noise, besides that there are some side effect, > for example two many oom logs printed to a slow console may cause some > latency spike. But the oom situation and the oom report is simply something an admin has to expect especially when the hard limit is set to 0. With kmem accounting there is no guarantee that the target will be met. > > > > Besides that I really do not see any actual problem that this would be > > fixing. > > Avoid printing two many oom logs. There is only a single oom report printed so I disagree this is really a proper justification. Unless you can come up with a better justification I am against this patch. It unnecessarily reduce debugging tools while it doesn't really provide any huge advantage. Changing the hard limit to impossible target is known to trigger the oom kernel and the oom report is a part of that. If the oom report is too noisy then we can discuss on how to make it more compact but making ad-hoc exceptions like this one is not a good solution. -- Michal Hocko SUSE Labs