From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.1 required=3.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1, USER_IN_DEF_DKIM_WL autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 38388C433E2 for ; Thu, 16 Jul 2020 19:53:20 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id E651F206F4 for ; Thu, 16 Jul 2020 19:53:19 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="j7cRjGst" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org E651F206F4 Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 6C59E8D0008; Thu, 16 Jul 2020 15:53:19 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 676048D0003; Thu, 16 Jul 2020 15:53:19 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 565EC8D0008; Thu, 16 Jul 2020 15:53:19 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0030.hostedemail.com [216.40.44.30]) by kanga.kvack.org (Postfix) with ESMTP id 410928D0003 for ; Thu, 16 Jul 2020 15:53:19 -0400 (EDT) Received: from smtpin09.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id CEB48181AC9CB for ; Thu, 16 Jul 2020 19:53:18 +0000 (UTC) X-FDA: 77044988076.09.cart02_5517be926f04 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin09.hostedemail.com (Postfix) with ESMTP id A9F8E180AD802 for ; Thu, 16 Jul 2020 19:53:18 +0000 (UTC) X-HE-Tag: cart02_5517be926f04 X-Filterd-Recvd-Size: 7117 Received: from mail-pg1-f194.google.com (mail-pg1-f194.google.com [209.85.215.194]) by imf38.hostedemail.com (Postfix) with ESMTP for ; Thu, 16 Jul 2020 19:53:18 +0000 (UTC) Received: by mail-pg1-f194.google.com with SMTP id m22so5353111pgv.9 for ; Thu, 16 Jul 2020 12:53:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:from:to:cc:subject:in-reply-to:message-id:references :user-agent:mime-version; bh=Ot9WKndhdlYI5dG8glilj0+ze5QprTC/vKHLPrywT8E=; b=j7cRjGstvoNrx1miRlgTskxj71GjllimVGj14BE+AjU4iWlJ8koddRmDgDJoIFSxp3 Y2yRjy0u0cLM1kFvlbPsAl4O+kEcdJoKRlwgQXhVkf3a3AM9vzG9MQLHgpgIItIVjsrO n7wXqdWSCRSrvZyAd26UZKFsHgE+JW3ptRmAT1LC/8xhawIb5arMz7aNnUBRuiudmyX9 hJ6i67Z8GuLX6mAu7nOKIejrUelLExgcULVVWepV4wE1nG2xWiA/5QnlYdKQn1+wnx4L hFeLj0QZ0dzoOo4r+UxxxZIR8aqm7JA2R5W3bMrV4jmYBuktyQxi1btf+L5U8znRjf9K KWpQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:in-reply-to:message-id :references:user-agent:mime-version; bh=Ot9WKndhdlYI5dG8glilj0+ze5QprTC/vKHLPrywT8E=; b=M38kNC2tCzBeTmEy1qY9P/ZV/RxmJDfKjl3/QHSrqGpv5kS2oxHhFSdbE0ma8HDj7k /xxV3ZnC2upYIW19Sb9wUJGbogLpECxMGK9ped6hFg9c2xzEsBDw58zIzykdlKow9BZ7 h/1fimDaY48ncfneU4xw+7AOfOIoZUjBkcjEndORzvrGF6vUFzOy6p6NO181Dyo0JMa+ qoCtIkvS9gkjRZIwnLG3SXg4wLGkXL8NJUeSs4mmUD/NqWR7pvhyJcZ1T/X90XVPz7L7 s4NCj9TSHAvLcQHbBp4i9NRcYl5X+365Ps3u+1ERny1oMHaTdpOr+f/hK+CD19Sse0PI rGAw== X-Gm-Message-State: AOAM532fFec+LxOGr2CCZcB0QzEAZRRNXeZwrXBf0F9l7bY0lhZPywhA Nn+MUdcR0zZKAzJpXc95GbBjtw== X-Google-Smtp-Source: ABdhPJwHdE5fB3mHTMjm1QJ+4eJjvBYaRbqtu0xDdW/nigh0mNHl6Pi71Yv39TiyzU1s8s3nPhaJHQ== X-Received: by 2002:a63:141:: with SMTP id 62mr5484832pgb.366.1594929197083; Thu, 16 Jul 2020 12:53:17 -0700 (PDT) Received: from [2620:15c:17:3:4a0f:cfff:fe51:6667] ([2620:15c:17:3:4a0f:cfff:fe51:6667]) by smtp.gmail.com with ESMTPSA id e15sm5431997pgt.17.2020.07.16.12.53.15 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 16 Jul 2020 12:53:15 -0700 (PDT) Date: Thu, 16 Jul 2020 12:53:14 -0700 (PDT) From: David Rientjes X-X-Sender: rientjes@chino.kir.corp.google.com To: Yafang Shao cc: Michal Hocko , Tetsuo Handa , Andrew Morton , Johannes Weiner , Linux MM Subject: Re: [PATCH v2] memcg, oom: check memcg margin for parallel oom In-Reply-To: Message-ID: References: <1594735034-19190-1-git-send-email-laoar.shao@gmail.com> User-Agent: Alpine 2.23 (DEB 453 2020-06-18) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII X-Rspamd-Queue-Id: A9F8E180AD802 X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam04 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, 16 Jul 2020, Yafang Shao wrote: > > It's likely a misunderstanding: I wasn't necessarily concerned about > > showing 60MB in a memcg limited to 100MB, that part we can deal with, the > > concern was after dumping all of that great information that instead of > > getting a "Killed process..." we get a "Oh, there's memory now, just > > kidding about everything we just dumped" ;) > > > > Actually the kernel is doing it now, see bellow, > > dump_header() <<<< dump lots of information > __oom_kill_process > p = find_lock_task_mm(victim); > if (!p) > return; <<<< without killing any process. > Ah, this is catching an instance where the chosen process has already done exit_mm(), good catch -- I can find examples of this by scraping kernel logs from our fleet. So it appears there is precedence for dumping all the oom info but not actually performing any action for it and I made the earlier point that diagnostic information in the kernel log here is still useful. I think it is still preferable that the kernel at least tell us why it didn't do anything, but as you mention that already happens today. Would you like to send a patch that checks for mem_cgroup_margin() here as well? A second patch could make the possible inaction more visibile, something like "Process ${pid} (${comm}) is already exiting" for the above check or "Memcg ${memcg} is no longer out of memory". Another thing that these messages indicate, beyond telling us why the oom killer didn't actually SIGKILL anything, is that we can expect some skew in the memory stats that shows an availability of memory. > > > We could likely enlighten userspace about that so that we don't consider > > that to be an actual oom kill. But I also very much agree that after > > dump_header() would be appropriate as well since the goal is to prevent > > unnecessary oom killing. > > > > Would you mind sending a patch to check mem_cgroup_margin() on > > is_memcg_oom() prior to sending the SIGKILL to the victim and printing the > > "Killed process..." line? We'd need a line that says "xKB of memory now > > available -- suppressing oom kill" or something along those lines so > > userspace understands what happened. But the memory info that it emits > > both for the state of the memcg and system RAM may also be helpful to > > understand why we got to the oom kill in the first place, which is also a > > good thing. > > > > I'd happy ack that patch since it would be a comprehensive solution that > > avoids oom kill of user processes at all costs, which is a goal I think we > > can all rally behind. > > I'd prefer to put dump_header() behind do_send_sig_info(), for example, > > __oom_kill_process() > do_send_sig_info() > dump_header() <<<< may better put it behind wake_oom_reaper(), but > it may loses some information to dump... > pr_err("%s: Killed process %d (%s)....") > I agree with Michal here that dump_header() after the actual kill would no longer represent the state of the system (or cpuset or memcg, depending on context) at the time of the oom kill so it's best to dump relevant information before the actual kill.