From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-10.1 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY, SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5AE2FC433E1 for ; Tue, 14 Jul 2020 13:25:45 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 1936320DD4 for ; Tue, 14 Jul 2020 13:25:45 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="Om387B/O" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 1936320DD4 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id AA3C56B0003; Tue, 14 Jul 2020 09:25:44 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id A533C6B0005; Tue, 14 Jul 2020 09:25:44 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 943208D0001; Tue, 14 Jul 2020 09:25:44 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0167.hostedemail.com [216.40.44.167]) by kanga.kvack.org (Postfix) with ESMTP id 7B8526B0003 for ; Tue, 14 Jul 2020 09:25:44 -0400 (EDT) Received: from smtpin25.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 02A29180AD81D for ; Tue, 14 Jul 2020 13:25:43 +0000 (UTC) X-FDA: 77036753808.25.plant06_1a05f6526ef1 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin25.hostedemail.com (Postfix) with ESMTP id 75CF61804E3A9 for ; Tue, 14 Jul 2020 13:25:41 +0000 (UTC) X-HE-Tag: plant06_1a05f6526ef1 X-Filterd-Recvd-Size: 7806 Received: from mail-il1-f196.google.com (mail-il1-f196.google.com [209.85.166.196]) by imf14.hostedemail.com (Postfix) with ESMTP for ; Tue, 14 Jul 2020 13:25:41 +0000 (UTC) Received: by mail-il1-f196.google.com with SMTP id a11so14198985ilk.0 for ; Tue, 14 Jul 2020 06:25:41 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=VQUv3rcXIfvcxgNGL/OrKySKezpV87+tRW6v8ORfd8Q=; b=Om387B/OgZM1KIXhEH7ISPWrKNf9wRoEDKXERfm/WJpS+botkNkzoAfjkxlScB7I0f ROc5xo40HklQIwV+XgY4C5jzOPLOXwRRVnFAneGNu/ZXIoB9onc/kQO5MjAfQdb3PvFm 97afu2A0sZOPOA6PeQLZ5ddINnfzt8b4AOBDG1KN1KBnlJvSV3XJeL96MGoys1rgcCR0 519uuXnLrurR6bgCPNvnVOMp4O2hzLbm7E4PwlhcUtoxQdYFDHwTxCEkZ7PcQupByBC5 YeLH7HNoQI7AeUl+H+ltfOf5fdZEkrUO8ALMgPt1PEAhoa4DgR+ujjgwyQQDBll6+suI wcHw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=VQUv3rcXIfvcxgNGL/OrKySKezpV87+tRW6v8ORfd8Q=; b=oT7zSsaZ+NvHG770eawN/9AP+6NIAgumwHCYMWM6rVmvI1g3NE61PRqq/Qh8ej7ctS YqnQxUnv8ETW4o7fh3wTJGIdOXIx9l41Cc23auYMp5f9sUHFGyM9FdqMgOzhUZzyEUh0 GkUUm9VEf/gHoGRAEtg7ZeJ82SrJhJyUbaLzV83ql2KPnB2Cd/A8TYT+1FnyBabL10h/ r+qENJJyQvHEwSC7FVlz6Q/zIUHC3L2OgbeLa5qud1xKBWMWU5bpD+/WGRBxICsrJZUZ R34PUPk1lGpkqTMoqIULAMyPXUspH+wR5cME92RyX35Cj07VI5wBaPg2ksqhqHPXd0XB XvTw== X-Gm-Message-State: AOAM530UmxDmv58WBbS9nU0a8NToifGG3wNt6tv0hs9xUJVf3h+88QQb TacsoQlgpDSpmOca3viX08XayqUyDiPDztic9oo= X-Google-Smtp-Source: ABdhPJwveuxHPNpqHMtSleCElmCHawxmYP+qpX5hjPH2C4jUzdIAI6AtRu+M+Y62bBeJddaDNjh9SaL45slFBG1E1hY= X-Received: by 2002:a92:da4c:: with SMTP id p12mr4774639ilq.142.1594733140573; Tue, 14 Jul 2020 06:25:40 -0700 (PDT) MIME-Version: 1.0 References: <1594728512-18969-1-git-send-email-laoar.shao@gmail.com> <20200714123726.GI24642@dhcp22.suse.cz> In-Reply-To: <20200714123726.GI24642@dhcp22.suse.cz> From: Yafang Shao Date: Tue, 14 Jul 2020 21:25:04 +0800 Message-ID: Subject: Re: [PATCH] mm, oom: check memcg margin for parallel oom To: Michal Hocko Cc: Tetsuo Handa , David Rientjes , Andrew Morton , Linux MM Content-Type: text/plain; charset="UTF-8" X-Rspamd-Queue-Id: 75CF61804E3A9 X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam05 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, Jul 14, 2020 at 8:37 PM Michal Hocko wrote: > > On Tue 14-07-20 08:08:32, Yafang Shao wrote: > > The commit 7775face2079 ("memcg: killed threads should not invoke memcg OOM > > killer") resolves the problem that different threads in a multi-threaded > > task doing parallel memcg oom, but it doesn't solve the problem that > > different tasks doing parallel memcg oom. > > > > It may happens that many different tasks in the same memcg are waiting > > oom_lock at the same time, if one of them has already made progress and > > freed enough available memory, the others don't need to trigger the oom > > killer again. By checking memcg margin after hold oom_lock can help > > achieve it. > > While the changelog makes sense it I believe it can be improved. I would > use something like the following. Feel free to use its parts. > > " > Memcg oom killer invocation is synchronized by the global oom_lock and > tasks are sleeping on the lock while somebody is selecting the victim or > potentially race with the oom_reaper is releasing the victim's memory. > This can result in a pointless oom killer invocation because a waiter > might be racing with the oom_reaper > > P1 oom_reaper P2 > oom_reap_task mutex_lock(oom_lock) > out_of_memory # no victim because we have one already > __oom_reap_task_mm mute_unlock(oom_lock) > mutex_lock(oom_lock) > set MMF_OOM_SKIP > select_bad_process > # finds a new victim > > The page allocator prevents from this race by trying to allocate after > the lock can be acquired (in __alloc_pages_may_oom) which acts as a last > minute check. Moreover page allocator simply doesn't block on the > oom_lock and simply retries the whole reclaim process. > > Memcg oom killer should do the last minute check as well. Call > mem_cgroup_margin to do that. Trylock on the oom_lock could be done as > well but this doesn't seem to be necessary at this stage. > " > > > Suggested-by: Michal Hocko > > Signed-off-by: Yafang Shao > > Cc: Michal Hocko > > Cc: Tetsuo Handa > > Cc: David Rientjes > > --- > > mm/memcontrol.c | 19 +++++++++++++++++-- > > 1 file changed, 17 insertions(+), 2 deletions(-) > > > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > > index 1962232..df141e1 100644 > > --- a/mm/memcontrol.c > > +++ b/mm/memcontrol.c > > @@ -1560,16 +1560,31 @@ static bool mem_cgroup_out_of_memory(struct mem_cgroup *memcg, gfp_t gfp_mask, > > .gfp_mask = gfp_mask, > > .order = order, > > }; > > - bool ret; > > + bool ret = true; > > > > if (mutex_lock_killable(&oom_lock)) > > return true; > > + > > /* > > * A few threads which were not waiting at mutex_lock_killable() can > > * fail to bail out. Therefore, check again after holding oom_lock. > > */ > > - ret = should_force_charge() || out_of_memory(&oc); > > + if (should_force_charge()) > > + goto out; > > + > > + /* > > + * Different tasks may be doing parallel oom, so after hold the > > + * oom_lock the task should check the memcg margin again to check > > + * whether other task has already made progress. > > + */ > > + if (mem_cgroup_margin(memcg) >= (1 << order)) > > + goto out; > > Is there any reason why you simply haven't done this? (+ your comment > which is helpful). > No strong reason. I just think that threads of a multi-thread task are more likely to do parallel OOM, so I checked it first. I can change it as you suggested below, as it is more simple. > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index 248e6cad0095..2c176825efe3 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -1561,15 +1561,21 @@ static bool mem_cgroup_out_of_memory(struct mem_cgroup *memcg, gfp_t gfp_mask, > .order = order, > .chosen_points = LONG_MIN, > }; > - bool ret; > + bool ret = true; > > - if (mutex_lock_killable(&oom_lock)) > + if (!mutex_trylock(&oom_lock)) > return true; > + > + if (mem_cgroup_margin(memcg) >= (1 << order)) > + goto unlock; > + > /* > * A few threads which were not waiting at mutex_lock_killable() can > * fail to bail out. Therefore, check again after holding oom_lock. > */ > ret = should_force_charge() || out_of_memory(&oc); > + > +unlock: > mutex_unlock(&oom_lock); > return ret; > } > > + > > + ret = out_of_memory(&oc); > > + > > +out: > > mutex_unlock(&oom_lock); > > + > > return ret; > > } > > > > -- > > 1.8.3.1 > > -- > Michal Hocko > SUSE Labs -- Thanks Yafang