From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.8 required=3.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1D1DEC433E0 for ; Thu, 9 Jul 2020 07:32:03 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id C7B4F2076A for ; Thu, 9 Jul 2020 07:32:02 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="MAZ8aO+g" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org C7B4F2076A Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 009476B0010; Thu, 9 Jul 2020 03:32:02 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id EFB136B0022; Thu, 9 Jul 2020 03:32:01 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id DEA6B6B0023; Thu, 9 Jul 2020 03:32:01 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0254.hostedemail.com [216.40.44.254]) by kanga.kvack.org (Postfix) with ESMTP id C4D936B0010 for ; Thu, 9 Jul 2020 03:32:01 -0400 (EDT) Received: from smtpin18.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 3DC9F8248047 for ; Thu, 9 Jul 2020 07:32:01 +0000 (UTC) X-FDA: 77017718442.18.dock11_181861226ec3 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin18.hostedemail.com (Postfix) with ESMTP id 08B5D100EDBC9 for ; Thu, 9 Jul 2020 07:32:01 +0000 (UTC) X-HE-Tag: dock11_181861226ec3 X-Filterd-Recvd-Size: 8548 Received: from mail-il1-f193.google.com (mail-il1-f193.google.com [209.85.166.193]) by imf16.hostedemail.com (Postfix) with ESMTP for ; Thu, 9 Jul 2020 07:32:00 +0000 (UTC) Received: by mail-il1-f193.google.com with SMTP id o3so1176407ilo.12 for ; Thu, 09 Jul 2020 00:32:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=zbQmagZVYqCSA0t2SsVnNQxIQF/GC7jkrnHjSzv4Vo8=; b=MAZ8aO+gUzIA42UfEqepfNDHvLcrWTAsWX4W6w0xJ4O73JTI/koSAM72Puk1RbGLzx tYgW1/LKqk+xsCdVtUJuH34IZzx4Gzoxor20YCp/JFJ56XC2UZ7u/Wn+lOIS/QRYH9Bk FG+29urGDQWORneTd/PA43ZA6EvL6tHoIogDmbfXIB9JWF1a0JGtp2722hF1QZx8exKE AKZvaYeCjz9g+Pz1ljbagJEkQGVVST58C+mAMkXQS/He7Z3BNjluPshsGIXn+dnRkqYf Xz+OGGLs93PE3CFXTt7GIST9eqlLKQAnzeM6hwlQKrtEf1SfmL5T2k+dOk0Ms4vfv83r GAfQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=zbQmagZVYqCSA0t2SsVnNQxIQF/GC7jkrnHjSzv4Vo8=; b=kLizVHgKKBkbfcnIktJgBUuoAh99PX70t1SXQAzLv3fgh0IsLrnN4J+ZOmjLyBfhd+ /POwzW4vZ/sy+/1JTuTkb7mNWHnpLibTZiUN1CZvBHJ9kFE2fYQ1DXEHTUh7F6sb4Af5 xy6yug1cC/5x4FrJEaAq8yN1AV5c26VVxMiiVRgqK7GBk6NQKMeYyskvHc7AqCuSP3BY w3QPDK3BsOnEkf+tykY1qY8tNncUz6Qfg0lqSvvM1wyHEGoOIjevQ0J0uWRU8tBdHm+F EM+qlphUiYwMw3MvlgTOTokz+xIRv3Qqe64BHgRcEZbW39rCn8ejRcU8QzciBEStMO1R suhw== X-Gm-Message-State: AOAM531cSMuQs1zaiPIGFGx9GX41IMxwgHKEnC4X5o2lYmbVgqkeQy8m ypm5wK9OiF//a+kW7kqRAd0Z5/tCxDZCGKnlE4A= X-Google-Smtp-Source: ABdhPJy0M9Tut4t7ZVzR7X1xg+MK/xIIEN3i9Dvp9ElpHS4vytJPxpyEo1oZlCLjlB/tkJKPKyQf8WH7duWMz4QLsQ0= X-Received: by 2002:a92:1b8c:: with SMTP id f12mr44592296ill.93.1594279919733; Thu, 09 Jul 2020 00:31:59 -0700 (PDT) MIME-Version: 1.0 References: <1594214649-9837-1-git-send-email-laoar.shao@gmail.com> <20200708142806.GJ7271@dhcp22.suse.cz> <20200708143211.GK7271@dhcp22.suse.cz> <20200708190225.GM7271@dhcp22.suse.cz> <20200709062644.GA12704@dhcp22.suse.cz> In-Reply-To: <20200709062644.GA12704@dhcp22.suse.cz> From: Yafang Shao Date: Thu, 9 Jul 2020 15:31:23 +0800 Message-ID: Subject: Re: [PATCH] mm, oom: make the calculation of oom badness more accurate To: Michal Hocko Cc: David Rientjes , Andrew Morton , Linux MM Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 08B5D100EDBC9 X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam05 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, Jul 9, 2020 at 2:26 PM Michal Hocko wrote: > > On Thu 09-07-20 10:14:14, Yafang Shao wrote: > > On Thu, Jul 9, 2020 at 3:02 AM Michal Hocko wrote: > > > > > > On Wed 08-07-20 10:57:27, David Rientjes wrote: > > > > On Wed, 8 Jul 2020, Michal Hocko wrote: > > > > > > > > > I have only now realized that David is not on Cc. Add him here. T= he > > > > > patch is http://lkml.kernel.org/r/1594214649-9837-1-git-send-emai= l-laoar.shao@gmail.com. > > > > > > > > > > I believe the main problem is that we are normalizing to oom_scor= e_adj > > > > > units rather than usage/total. I have a very vague recollection t= his has > > > > > been done in the past but I didn't get to dig into details yet. > > > > > > > > > > > > > The memcg max is 4194304 pages, and an oom_score_adj of -998 would = yield a > > > > page adjustment of: > > > > > > > > adj =3D -998 * 4194304 / 1000 =3D =E2=88=924185915 pages > > > > > > > > The largest pid 58406 (data_sim) has rss 3967322 pages, > > > > pgtables 37101568 / 4096 =3D 9058 pages, and swapents 0. So it's u= nadjusted > > > > badness is > > > > > > > > 3967322 + 9058 pages =3D 3976380 pages > > > > > > > > Factoring in oom_score_adj, all of these processes will have a badn= ess of > > > > 1 because oom_badness() doesn't underflow, which I think is the poi= nt of > > > > Yafang's proposal. > > > > > > > > I think the patch can work but, as you mention, also needs an updat= e to > > > > proc_oom_score(). proc_oom_score() is using the global amount of m= emory > > > > so Yafang is likely not seeing it go negative for that reason but i= t could > > > > happen. > > > > > > Yes, memcg just makes it more obvious but the same might happen for t= he > > > global case. I am not sure how we can both alow underflow and present > > > the value that would fit the existing model. The exported value shoul= d > > > really reflect what the oom killer is using for the calculation or we > > > are going to see discrepancies between the real oom decision and > > > presented values. So I believe we really have to change the calculati= on > > > rather than just make it tolerant to underflows. > > > > > > > Hi Michal, > > > > - Before my patch, > > The result of oom_badness() is [1, 2 * totalpages), > > and the result of proc_oom_score() is [0, 2000). > > > > While the badness score in the Documentation/filesystems/proc.rst is: [= 0, 1000] > > "The badness heuristic assigns a value to each candidate task ranging f= rom 0 > > (never kill) to 1000 (always kill) to determine which process is target= ed" > > > > That means, we need to update the documentation anyway unless my > > calculation is wrong. > > No, your calculation is correct. The documentation is correct albeit > slightly misleading. The net score calculation is indeed in range of [0, = 1000]. > It is the oom_score_adj added on top which skews it. This is documented > as > "The value of /proc//oom_score_adj is added to the badness score bef= ore it > is used to determine which task to kill." > > This is the exported value but paragraph "3.2 /proc//oom_score" only= says > "This file can be used to check the current score used by the oom-killer = is for > any given ." which is not really explicit about the exported range. > > Maybe clarifying that would be helpful. I will post a patch. There are > few other things to sync up with the current state. > > > So the point will be how to change it ? > > > > - After my patch > > oom_badness(): (-totalpages, 2 * totalpages) > > proc_oom_score(): (-1000, 2000) > > > > If we allow underflow, we can change the documentation as "from -1000 > > (never kill) to 2000(always kill)". > > While if we don't allow underflow, we can make bellow simple change, > > > > diff --git a/fs/proc/base.c b/fs/proc/base.c > > index 774784587..0da8efa41 100644 > > --- a/fs/proc/base.c > > +++ b/fs/proc/base.c > > @@ -528,7 +528,7 @@ static int proc_oom_score(struct seq_file *m, > > struct pid_namespace *ns, > > unsigned long totalpages =3D totalram_pages + total_swap_pages; > > unsigned long points =3D 0; > > > > - points =3D oom_badness(task, NULL, NULL, totalpages) * > > + points =3D 1000 + oom_badness(task, NULL, NULL, totalpages) * > > 1000 / totalpages; > > seq_printf(m, "%lu\n", points); > > > > And then update the documentation as "from 0 (never kill) to 3000 > > (always kill)" > > This is still not quite there yet, I am afraid. OOM_SCORE_ADJ_MIN tasks h= ave > always reported 0 and I can imagine somebody might depend on this fact. No, I don't think anybody will use the reported 0 to get the conclusion that it is a OOM_SCORE_ADJ_MIN task. Because, points =3D oom_badness(task, totalpages) * 1000 / totalpages; so the points will always be 0 if the return value of oom_badness(task, totalpages) is less than totalpages/1000. If the user wants to know whether it is an OOM_SCORE_ADJ_MIN task, he will always use /proc/[pid]/oom_score_adj to get it, that is more reliable. > So you need to special case LONG_MIN at least. It would be also better > to stick with [0, 2000] range. I don't know why it must stick with [0, 2000] range. As the oom_score_adj sticks with [-1000, 1000] range, I think the proc_oom_score() could be a negative value as well. --=20 Thanks Yafang