From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.5 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,NICE_REPLY_A,SPF_HELO_NONE, SPF_PASS,URIBL_BLOCKED,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id BC7FAC433E1 for ; Mon, 20 Jul 2020 13:13:06 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 628D620729 for ; Mon, 20 Jul 2020 13:13:06 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 628D620729 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=i-love.sakura.ne.jp Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id A0FA46B0002; Mon, 20 Jul 2020 09:13:05 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 9C04D8D0001; Mon, 20 Jul 2020 09:13:05 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8AE0B6B0006; Mon, 20 Jul 2020 09:13:05 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0163.hostedemail.com [216.40.44.163]) by kanga.kvack.org (Postfix) with ESMTP id 714D36B0002 for ; Mon, 20 Jul 2020 09:13:05 -0400 (EDT) Received: from smtpin05.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 25718184B254F for ; Mon, 20 Jul 2020 13:13:05 +0000 (UTC) X-FDA: 77058494730.05.spy31_22122ac26f25 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin05.hostedemail.com (Postfix) with ESMTP id 0BF8A1850E7C2 for ; Mon, 20 Jul 2020 13:11:27 +0000 (UTC) X-HE-Tag: spy31_22122ac26f25 X-Filterd-Recvd-Size: 6318 Received: from www262.sakura.ne.jp (www262.sakura.ne.jp [202.181.97.72]) by imf50.hostedemail.com (Postfix) with ESMTP for ; Mon, 20 Jul 2020 13:11:25 +0000 (UTC) Received: from fsav103.sakura.ne.jp (fsav103.sakura.ne.jp [27.133.134.230]) by www262.sakura.ne.jp (8.15.2/8.15.2) with ESMTP id 06KDBE4T005983; Mon, 20 Jul 2020 22:11:14 +0900 (JST) (envelope-from penguin-kernel@i-love.sakura.ne.jp) Received: from www262.sakura.ne.jp (202.181.97.72) by fsav103.sakura.ne.jp (F-Secure/fsigk_smtp/550/fsav103.sakura.ne.jp); Mon, 20 Jul 2020 22:11:14 +0900 (JST) X-Virus-Status: clean(F-Secure/fsigk_smtp/550/fsav103.sakura.ne.jp) Received: from [192.168.1.9] (M106072142033.v4.enabler.ne.jp [106.72.142.33]) (authenticated bits=0) by www262.sakura.ne.jp (8.15.2/8.15.2) with ESMTPSA id 06KDB9vU005951 (version=TLSv1.2 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Mon, 20 Jul 2020 22:11:14 +0900 (JST) (envelope-from penguin-kernel@i-love.sakura.ne.jp) Subject: Re: [PATCH] mm, oom: show process exiting information in __oom_kill_process() To: Yafang Shao Cc: Michal Hocko , David Rientjes , Andrew Morton , Linux MM References: <1595166795-27587-1-git-send-email-laoar.shao@gmail.com> <20200720071607.GA18535@dhcp22.suse.cz> <253332d9-9f8c-d472-0bf4-388b29ecfb96@i-love.sakura.ne.jp> From: Tetsuo Handa Message-ID: <7f58363a-db1a-5502-e2b4-ee4b9fa31824@i-love.sakura.ne.jp> Date: Mon, 20 Jul 2020 22:11:07 +0900 User-Agent: Mozilla/5.0 (Windows NT 6.3; Win64; x64; rv:68.0) Gecko/20100101 Thunderbird/68.10.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit X-Rspamd-Queue-Id: 0BF8A1850E7C2 X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam01 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 2020/07/20 21:19, Yafang Shao wrote: > On Mon, Jul 20, 2020 at 7:06 PM Tetsuo Handa > wrote: >> >> On 2020/07/20 19:36, Yafang Shao wrote: >>> On Mon, Jul 20, 2020 at 3:16 PM Michal Hocko wrote: >>>> I do agree that a silent bail out is not the best thing to do. The above >>>> message would be more useful if it also explained what the oom killer >>>> does (or does not): >>>> >>>> "OOM victim %d (%s) is already exiting. Skip killing the task\n" >>>> >>> >>> Sure. >> >> This path is rarely hit because find_lock_task_mm() in oom_badness() from >> select_bad_process() in the next round of OOM killer will skip this task. >> >> Since we don't wake up the OOM reaper when hitting this path, unless __mmput() >> for this task itself immediately reclaims memory and updates the statistics >> counter, we just get two chunks of dump_header() messages and one OOM victim. >> > > Could you pls. explain more specifically why we will get two chunks of > dump_header()? > My understanding is the free_mm() happens between select_bad_process() > and __oom_kill_process() as bellow, > > P1 > Victim > select_bad_process() > oom_badness() > p = find_lock_task_mm() # p isn't NULL > > __mmput() > > free_mm() > dump_header() # dump once > __oom_kill_process() > p = find_lock_task_mm(victim); # p is NULL now > > So where is another dump_header() ? > Start of __mmput() does not guarantee that memory is reclaimed immediately. Moreover, even __mmput() might not have started by the moment second chunk of dump_header() happens. The "OOM victim %d (%s) is already exiting." case only indicates that victim's mm became NULL; there is no guarantee that memory is reclaimed (in order to avoid OOM kill) by the moment next round hits. P1 Victim1 Victim2 out_of_memory() { select_bad_process() { oom_badness() { p = find_lock_task_mm() { task_lock(victim); // finds Victim1 because Victim1->mm != NULL. } get_task_struct(p); task_unlock(p); } } oom_kill_process() { task_lock(victim); task_unlock(victim); do_exit() { dump_header(oc, victim); // first dump_header() with Victim1 and Victim2 __oom_kill_process(victim, message) { exit_mm() { task_lock(current); current->mm = NULL; task_unlock(current); p = find_lock_task_mm(victim); put_task_struct(victim); // without killing Victim1 because p == NULL. } } } } out_of_memory() { select_bad_process() { oom_badness() { p = find_lock_task_mm() { task_lock(victim); // finds Victim2 because Victim2->mm != NULL. } get_task_struct(p); task_unlock(p); } } mmput() { __mmput() { uprobe_clear_state() { // Might wait for delayed_uprobe_lock. } oom_kill_process() { task_lock(victim); task_unlock(victim); dump_header(oc, victim); // second dump_header() with Victim2 __oom_kill_process(victim, message) { p = find_lock_task_mm(victim); pr_err("%s: Killed process %d (%s) "...); // first kill message. put_task_struct(p); } } } exit_mmap(); // Which frees memory. } } } } Maybe the better behavior is to restart out_of_memory() without dump_header() (we can remember whether we already called dump_header() into "struct oom_control"), with last second watermark check before select_bad_process() and after dump_header().