From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=jIX+=AZ=kvack.org=owner-linux-mm@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-10.1 required=3.0 tests=BAYES_00,DKIM_SIGNED,
	DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM,
	HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY,
	SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 5AE2FC433E1
	for <linux-mm@archiver.kernel.org>; Tue, 14 Jul 2020 13:25:45 +0000 (UTC)
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by mail.kernel.org (Postfix) with ESMTP id 1936320DD4
	for <linux-mm@archiver.kernel.org>; Tue, 14 Jul 2020 13:25:45 +0000 (UTC)
Authentication-Results: mail.kernel.org;
	dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="Om387B/O"
DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 1936320DD4
Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com
Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix)
	id AA3C56B0003; Tue, 14 Jul 2020 09:25:44 -0400 (EDT)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id A533C6B0005; Tue, 14 Jul 2020 09:25:44 -0400 (EDT)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id 943208D0001; Tue, 14 Jul 2020 09:25:44 -0400 (EDT)
X-Delivered-To: linux-mm@kvack.org
Received: from forelay.hostedemail.com (smtprelay0167.hostedemail.com [216.40.44.167])
	by kanga.kvack.org (Postfix) with ESMTP id 7B8526B0003
	for <linux-mm@kvack.org>; Tue, 14 Jul 2020 09:25:44 -0400 (EDT)
Received: from smtpin25.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251])
	by forelay01.hostedemail.com (Postfix) with ESMTP id 02A29180AD81D
	for <linux-mm@kvack.org>; Tue, 14 Jul 2020 13:25:43 +0000 (UTC)
X-FDA: 77036753808.25.plant06_1a05f6526ef1
Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251])
	by smtpin25.hostedemail.com (Postfix) with ESMTP id 75CF61804E3A9
	for <linux-mm@kvack.org>; Tue, 14 Jul 2020 13:25:41 +0000 (UTC)
X-HE-Tag: plant06_1a05f6526ef1
X-Filterd-Recvd-Size: 7806
Received: from mail-il1-f196.google.com (mail-il1-f196.google.com [209.85.166.196])
	by imf14.hostedemail.com (Postfix) with ESMTP
	for <linux-mm@kvack.org>; Tue, 14 Jul 2020 13:25:41 +0000 (UTC)
Received: by mail-il1-f196.google.com with SMTP id a11so14198985ilk.0
        for <linux-mm@kvack.org>; Tue, 14 Jul 2020 06:25:41 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20161025;
        h=mime-version:references:in-reply-to:from:date:message-id:subject:to
         :cc;
        bh=VQUv3rcXIfvcxgNGL/OrKySKezpV87+tRW6v8ORfd8Q=;
        b=Om387B/OgZM1KIXhEH7ISPWrKNf9wRoEDKXERfm/WJpS+botkNkzoAfjkxlScB7I0f
         ROc5xo40HklQIwV+XgY4C5jzOPLOXwRRVnFAneGNu/ZXIoB9onc/kQO5MjAfQdb3PvFm
         97afu2A0sZOPOA6PeQLZ5ddINnfzt8b4AOBDG1KN1KBnlJvSV3XJeL96MGoys1rgcCR0
         519uuXnLrurR6bgCPNvnVOMp4O2hzLbm7E4PwlhcUtoxQdYFDHwTxCEkZ7PcQupByBC5
         YeLH7HNoQI7AeUl+H+ltfOf5fdZEkrUO8ALMgPt1PEAhoa4DgR+ujjgwyQQDBll6+suI
         wcHw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20161025;
        h=x-gm-message-state:mime-version:references:in-reply-to:from:date
         :message-id:subject:to:cc;
        bh=VQUv3rcXIfvcxgNGL/OrKySKezpV87+tRW6v8ORfd8Q=;
        b=oT7zSsaZ+NvHG770eawN/9AP+6NIAgumwHCYMWM6rVmvI1g3NE61PRqq/Qh8ej7ctS
         YqnQxUnv8ETW4o7fh3wTJGIdOXIx9l41Cc23auYMp5f9sUHFGyM9FdqMgOzhUZzyEUh0
         GkUUm9VEf/gHoGRAEtg7ZeJ82SrJhJyUbaLzV83ql2KPnB2Cd/A8TYT+1FnyBabL10h/
         r+qENJJyQvHEwSC7FVlz6Q/zIUHC3L2OgbeLa5qud1xKBWMWU5bpD+/WGRBxICsrJZUZ
         R34PUPk1lGpkqTMoqIULAMyPXUspH+wR5cME92RyX35Cj07VI5wBaPg2ksqhqHPXd0XB
         XvTw==
X-Gm-Message-State: AOAM530UmxDmv58WBbS9nU0a8NToifGG3wNt6tv0hs9xUJVf3h+88QQb
	TacsoQlgpDSpmOca3viX08XayqUyDiPDztic9oo=
X-Google-Smtp-Source: ABdhPJwveuxHPNpqHMtSleCElmCHawxmYP+qpX5hjPH2C4jUzdIAI6AtRu+M+Y62bBeJddaDNjh9SaL45slFBG1E1hY=
X-Received: by 2002:a92:da4c:: with SMTP id p12mr4774639ilq.142.1594733140573;
 Tue, 14 Jul 2020 06:25:40 -0700 (PDT)
MIME-Version: 1.0
References: <1594728512-18969-1-git-send-email-laoar.shao@gmail.com> <20200714123726.GI24642@dhcp22.suse.cz>
In-Reply-To: <20200714123726.GI24642@dhcp22.suse.cz>
From: Yafang Shao <laoar.shao@gmail.com>
Date: Tue, 14 Jul 2020 21:25:04 +0800
Message-ID: <CALOAHbBZuvVNnYS1=KkLH4kioVAoFDeqJMe_hkfZ2dkPnWR7Kw@mail.gmail.com>
Subject: Re: [PATCH] mm, oom: check memcg margin for parallel oom
To: Michal Hocko <mhocko@kernel.org>
Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>, 
	David Rientjes <rientjes@google.com>, Andrew Morton <akpm@linux-foundation.org>, 
	Linux MM <linux-mm@kvack.org>
Content-Type: text/plain; charset="UTF-8"
X-Rspamd-Queue-Id: 75CF61804E3A9
X-Spamd-Result: default: False [0.00 / 100.00]
X-Rspamd-Server: rspam05
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>

On Tue, Jul 14, 2020 at 8:37 PM Michal Hocko <mhocko@kernel.org> wrote:
>
> On Tue 14-07-20 08:08:32, Yafang Shao wrote:
> > The commit 7775face2079 ("memcg: killed threads should not invoke memcg OOM
> > killer") resolves the problem that different threads in a multi-threaded
> > task doing parallel memcg oom, but it doesn't solve the problem that
> > different tasks doing parallel memcg oom.
> >
> > It may happens that many different tasks in the same memcg are waiting
> > oom_lock at the same time, if one of them has already made progress and
> > freed enough available memory, the others don't need to trigger the oom
> > killer again. By checking memcg margin after hold oom_lock can help
> > achieve it.
>
> While the changelog makes sense it I believe it can be improved. I would
> use something like the following. Feel free to use its parts.
>
> "
> Memcg oom killer invocation is synchronized by the global oom_lock and
> tasks are sleeping on the lock while somebody is selecting the victim or
> potentially race with the oom_reaper is releasing the victim's memory.
> This can result in a pointless oom killer invocation because a waiter
> might be racing with the oom_reaper
>
>         P1              oom_reaper              P2
>                         oom_reap_task           mutex_lock(oom_lock)
>                                                 out_of_memory # no victim because we have one already
>                         __oom_reap_task_mm      mute_unlock(oom_lock)
> mutex_lock(oom_lock)
>                         set MMF_OOM_SKIP
> select_bad_process
> # finds a new victim
>
> The page allocator prevents from this race by trying to allocate after
> the lock can be acquired (in __alloc_pages_may_oom) which acts as a last
> minute check. Moreover page allocator simply doesn't block on the
> oom_lock and simply retries the whole reclaim process.
>
> Memcg oom killer should do the last minute check as well. Call
> mem_cgroup_margin to do that. Trylock on the oom_lock could be done as
> well but this doesn't seem to be necessary at this stage.
> "
>
> > Suggested-by: Michal Hocko <mhocko@kernel.org>
> > Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
> > Cc: Michal Hocko <mhocko@kernel.org>
> > Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
> > Cc: David Rientjes <rientjes@google.com>
> > ---
> >  mm/memcontrol.c | 19 +++++++++++++++++--
> >  1 file changed, 17 insertions(+), 2 deletions(-)
> >
> > diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> > index 1962232..df141e1 100644
> > --- a/mm/memcontrol.c
> > +++ b/mm/memcontrol.c
> > @@ -1560,16 +1560,31 @@ static bool mem_cgroup_out_of_memory(struct mem_cgroup *memcg, gfp_t gfp_mask,
> >               .gfp_mask = gfp_mask,
> >               .order = order,
> >       };
> > -     bool ret;
> > +     bool ret = true;
> >
> >       if (mutex_lock_killable(&oom_lock))
> >               return true;
> > +
> >       /*
> >        * A few threads which were not waiting at mutex_lock_killable() can
> >        * fail to bail out. Therefore, check again after holding oom_lock.
> >        */
> > -     ret = should_force_charge() || out_of_memory(&oc);
> > +     if (should_force_charge())
> > +             goto out;
> > +
> > +     /*
> > +      * Different tasks may be doing parallel oom, so after hold the
> > +      * oom_lock the task should check the memcg margin again to check
> > +      * whether other task has already made progress.
> > +      */
> > +     if (mem_cgroup_margin(memcg) >= (1 << order))
> > +             goto out;
>
> Is there any reason why you simply haven't done this? (+ your comment
> which is helpful).
>

No strong reason.
I just think that threads of a multi-thread task are more likely to do
parallel OOM, so I checked it first.
I can change it as you suggested below,  as it is more simple.


> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 248e6cad0095..2c176825efe3 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -1561,15 +1561,21 @@ static bool mem_cgroup_out_of_memory(struct mem_cgroup *memcg, gfp_t gfp_mask,
>                 .order = order,
>                 .chosen_points = LONG_MIN,
>         };
> -       bool ret;
> +       bool ret = true;
>
> -       if (mutex_lock_killable(&oom_lock))
> +       if (!mutex_trylock(&oom_lock))
>                 return true;
> +
> +       if (mem_cgroup_margin(memcg) >= (1 << order))
> +               goto unlock;
> +
>         /*
>          * A few threads which were not waiting at mutex_lock_killable() can
>          * fail to bail out. Therefore, check again after holding oom_lock.
>          */
>         ret = should_force_charge() || out_of_memory(&oc);
> +
> +unlock:
>         mutex_unlock(&oom_lock);
>         return ret;
>  }
> > +
> > +     ret = out_of_memory(&oc);
> > +
> > +out:
> >       mutex_unlock(&oom_lock);
> > +
> >       return ret;
> >  }
> >
> > --
> > 1.8.3.1
>
> --
> Michal Hocko
> SUSE Labs


-- 
Thanks
Yafang