From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-0.6 required=3.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id A6A62C3815B for ; Mon, 20 Apr 2020 08:52:43 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 5B89A20A8B for ; Mon, 20 Apr 2020 08:52:43 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="YaJKvmZX" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 5B89A20A8B Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 090EE8E0005; Mon, 20 Apr 2020 04:52:43 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 040D58E0003; Mon, 20 Apr 2020 04:52:43 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E98D78E0005; Mon, 20 Apr 2020 04:52:42 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0112.hostedemail.com [216.40.44.112]) by kanga.kvack.org (Postfix) with ESMTP id D43738E0003 for ; Mon, 20 Apr 2020 04:52:42 -0400 (EDT) Received: from smtpin26.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 919285DE5 for ; Mon, 20 Apr 2020 08:52:42 +0000 (UTC) X-FDA: 76727617764.26.grip79_6eefa08b96f4d X-HE-Tag: grip79_6eefa08b96f4d X-Filterd-Recvd-Size: 5931 Received: from mail-il1-f194.google.com (mail-il1-f194.google.com [209.85.166.194]) by imf11.hostedemail.com (Postfix) with ESMTP for ; Mon, 20 Apr 2020 08:52:42 +0000 (UTC) Received: by mail-il1-f194.google.com with SMTP id f82so8937683ilh.8 for ; Mon, 20 Apr 2020 01:52:42 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=aKEYeQTqKnt3y85cexYib/sYF/QhuOAudCFyG8fRltY=; b=YaJKvmZXwAZUPgYrEeBcLA1wfF5o4S0QpOpMjJZ32JCJguBVy/0VxmkbKX8Al1VnxX OLO4ygPZzxiEM2WiIohDjFuglyCJLwT6OULPmAWZVL4kW0bQQYYDiRcujbpTlwugFXRx hzk82DQncfLWDeS7Xp7nH4BDoNVqmRvPZ7qpbNuO+vyBP2gdXGBb2F+u0YYdYQNT7QyS sMbUyVp3xUHNIzgKvnO5J/AgNCR4/mqkQvbEkLT/CplziedeD83jy9WJ8FV86+HOLHAn kK4lUmitQL4AiqDDFHEhStaHHkjFOmgvrJ0CkeqTqpAMjwa3dP5UgbwVVL7f90JbKfnL X60Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=aKEYeQTqKnt3y85cexYib/sYF/QhuOAudCFyG8fRltY=; b=IX4DSTS0pOToFa9nQxqLTNRy7V8EixpYbLEAo25rGBJDokcYVf2zvgxPjKPlj3FOe+ 9EAED7zTQbiHuDv9iHnyDnFn78G5z84nDcUUBoYnxFonvj44+IrsPnuLuVpVeNYyu7xm f2Cp/HbwH69XfjAFhu1BCiSfUHa5LavpGn9N4UmQcsDb/VlXWbo8qox2YecDSdvga51C FvJ8EzJD/tFh57dnK2Ne0+fBkBm6fKuZerfkAgHtxJ/6vVLbbCfWRVVeCfBngY9UlGL8 eQLMHEFzuRMTyiIrguegJtnZ/RyBN5J6nM2bKPCSt3MMOHLALwH6svoKUiC5Jpwe0BdC mJNQ== X-Gm-Message-State: AGi0PuY43sFb7Ylo1gwJDbwOQLJoTt8xEuRsInmWbCfLnz6rd6kbtuCj 7HiSwb1EFCI02KS6i34g13ciH1MF1uuQbABOvgHjyYTzsdM= X-Google-Smtp-Source: APiQypJmHhOnf22vu+2nB/LKqfTie0e9aplZlXpuFZr88DBsxfdFU0MaiDzjxcejgcRf5MoKWYs7SM62asIxPJkaJVg= X-Received: by 2002:a92:5c57:: with SMTP id q84mr14935792ilb.203.1587372761677; Mon, 20 Apr 2020 01:52:41 -0700 (PDT) MIME-Version: 1.0 References: <20200418151311.7397-1-laoar.shao@gmail.com> <20200418151311.7397-4-laoar.shao@gmail.com> <20200420081353.GI27314@dhcp22.suse.cz> In-Reply-To: <20200420081353.GI27314@dhcp22.suse.cz> From: Yafang Shao Date: Mon, 20 Apr 2020 16:52:05 +0800 Message-ID: Subject: Re: [PATCH 3/3] memcg oom: bail out from the charge path if no victim found To: Michal Hocko Cc: Johannes Weiner , Vladimir Davydov , Andrew Morton , Linux MM Content-Type: text/plain; charset="UTF-8" X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Mon, Apr 20, 2020 at 4:13 PM Michal Hocko wrote: > > On Sat 18-04-20 11:13:11, Yafang Shao wrote: > > Without considering the manually triggered OOM, if no victim found in > > system OOM, the system will be deadlocked on memory, however if no > > victim found in memcg OOM, it can charge successfully and runs well. > > This behavior in memcg oom is not proper because that can prevent the > > memcg from being limited. > > > > Take an easy example. > > $ cd /sys/fs/cgroup/foo/ > > $ echo $$ > cgroup.procs > > $ echo 200M > memory.max > > $ cat memory.max > > 209715200 > > $ echo -1000 > /proc/$$/oom_score_adj > > Then, let's run a memhog task in memcg foo, which will allocate 1G > > memory and keeps running. > > $ /home/yafang/test/memhog & > > Well, echo -1000 is a privileged operation. And it has to be used with > an extreme care because you know that you are creating an unkillable > task. So the above test is a clear example of the misconfiguration. > Right. This issue is really tiggered by the misconfiguration. > > Then memory.current will be greater than memory.max. Run bellow command > > in another shell. > > $ cat /sys/fs/cgroup/foo/memory.current > > 1097228288 > > The tasks which have already allocated memory and won't allocate new > > memory still runs well. This behavior makes nonsense. > > > > This patch is to improve it. > > If no victim found in memcg oom, we should force the current task to > > wait until there's available pages. That is similar with the behavior in > > memcg1 when oom_kill_disable is set. > > The primary reason why we force the charge is because we _cannot_ wait > indefinitely in the charge path because the current call chain might > hold locks or other resources which could block a large part of the > system. You are essentially reintroducing that behavior. > Seems my poor English misleads you ? The task is NOT waiting in the charge path, while it is really waiting at the the end of the page fault, so it doesn't hold any locks. See the comment above mem_cgroup_oom_synchronize() /* * ... * Memcg supports userspace OOM handling where failed allocations must * sleep on a waitqueue until the userspace task resolves the * situation. Sleeping directly in the charge context with all kinds * of locks held is not a good idea, instead we remember an OOM state * in the task and mem_cgroup_oom_synchronize() has to be called at * the end of the page fault to complete the OOM handling. * ... */ bool mem_cgroup_oom_synchronize(bool handle) > Is the above example a real usecase or you have just tried a test case > that would trigger the problem? On my server I found the memory usage of a container was greater than the limit of it. >From the dmesg I know there's no killable tasks becasue the oom_score_adj is set with -1000. Then I tried this test case to produce this issue. This issue can be triggerer by the misconfiguration of oom_score_adj, and can also be tiggered by a memoy leak in the task with oom_score_adj -1000. Thanks Yafang