From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 63359C433F5 for ; Wed, 18 May 2022 20:42:50 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B20556B0072; Wed, 18 May 2022 16:42:49 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id AD0DB6B0073; Wed, 18 May 2022 16:42:49 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 997D46B0074; Wed, 18 May 2022 16:42:49 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 8E0396B0072 for ; Wed, 18 May 2022 16:42:49 -0400 (EDT) Received: from smtpin04.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 5B82C20906 for ; Wed, 18 May 2022 20:42:49 +0000 (UTC) X-FDA: 79480037658.04.0A3269A Received: from mail-pf1-f169.google.com (mail-pf1-f169.google.com [209.85.210.169]) by imf22.hostedemail.com (Postfix) with ESMTP id A532AC00DB for ; Wed, 18 May 2022 20:42:46 +0000 (UTC) Received: by mail-pf1-f169.google.com with SMTP id y199so3189279pfb.9 for ; Wed, 18 May 2022 13:42:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=6HxZzpngw/etNChNibsDXhKswEW/6aIadFsBJdrJJo4=; b=awWcNKqjy2oRaHWNaqLwLD6wKR1D/z59/WTk7SitzgIT2rrcSjEAzKnmIANUYPI594 Za9zLKSXvw9xDbb6WAqjfnsaqa+LcQ85oop4z3ut4hYBrcONqxF0sv1J7v9CRmQAgFwS JPhO4DcmtuiP+C8flQBkc7mV9fcXvo5etKLQ23hS9bAof2A9TG7qkYj4gGtRwoaYvbhi qnQSQ41jmnAtz4N2LC6bnhJSxOh4NBimWb3ryXPExRgj4MHexjRknJtWD35Pdw6VfcIA JwZ/xcXEkCG2cEPoyKjmIaBjOMiXjmim/5dVWzOCqBtKv2aKMnyA4ctLcHfI3Su9ySpC JLQQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=6HxZzpngw/etNChNibsDXhKswEW/6aIadFsBJdrJJo4=; b=A31EK4dQ+jJCc+dY6mk11wX8zCQTPLCabg2fadTdDeH7rGdOIc80rbb8CE3J6JxMjk 16CfHmO99kfa+L4wyskVmAuolFvUUBkCMSs/X4Ze7M+cbJeI8Shoy4qaFqi1HLHft++c JTsOeeVaduLyo2B+USqzvSmCif6ofsOL3ToVfLS8W4c3Z4nBfVG20omR855ncw32xlPc T0II4iGsa1PfhpKcnqZPw2oXKBrcDbUb/wmatWzuHh+01nV33HJ+my9LBaRAF7w412YE kV+AifpQzJ4UW4AskuGW8upGuYsNwty10AKtcX4ejv5RzK2S/65EBFmYtwwEDHSJKOen MPdw== X-Gm-Message-State: AOAM5302EcPnvVK9FKPJDiAGtKo8L9U4zgixT2UDpTs2gje2/O2pEF3T UffZLk8Lx6bqyQ+TF/RWrvonWmDQuxDgQRnW9+0= X-Google-Smtp-Source: ABdhPJwr3C01INAi6GC0hkskroZFNDJuYcDLl778oO2SYsFl1Sm5OI4jxNpbq7dQJneYf4lpbm0rmakb/7lTkLilevw= X-Received: by 2002:a65:4b81:0:b0:3c6:19aa:ea37 with SMTP id t1-20020a654b81000000b003c619aaea37mr1007895pgq.75.1652906567807; Wed, 18 May 2022 13:42:47 -0700 (PDT) MIME-Version: 1.0 References: <20220518190911.82400-1-hannes@cmpxchg.org> In-Reply-To: <20220518190911.82400-1-hannes@cmpxchg.org> From: Yang Shi Date: Wed, 18 May 2022 13:42:35 -0700 Message-ID: Subject: Re: [PATCH] Revert "mm/vmscan: never demote for memcg reclaim" To: Johannes Weiner Cc: Dave Hansen , "Huang, Ying" , Yang Shi , Andrew Morton , Linux MM , Cgroups , Linux Kernel Mailing List , Kernel Team , Zi Yan , Michal Hocko , Shakeel Butt , Roman Gushchin Content-Type: text/plain; charset="UTF-8" X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: A532AC00DB X-Stat-Signature: 111gt979ztkasrjjhhusunr3aay1y3h7 Authentication-Results: imf22.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=awWcNKqj; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf22.hostedemail.com: domain of shy828301@gmail.com designates 209.85.210.169 as permitted sender) smtp.mailfrom=shy828301@gmail.com X-Rspam-User: X-HE-Tag: 1652906566-121490 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed, May 18, 2022 at 12:09 PM Johannes Weiner wrote: > > This reverts commit 3a235693d3930e1276c8d9cc0ca5807ef292cf0a. > > Its premise was that cgroup reclaim cares about freeing memory inside > the cgroup, and demotion just moves them around within the cgroup > limit. Hence, pages from toptier nodes should be reclaimed directly. Yes, exactly. > > However, with NUMA balancing now doing tier promotions, demotion is > part of the page aging process. Global reclaim demotes the coldest > toptier pages to secondary memory, where their life continues and from > which they have a chance to get promoted back. Essentially, tiered > memory systems have an LRU order that spans multiple nodes. > > When cgroup reclaims pages coming off the toptier directly, there can > be colder pages on lower tier nodes that were demoted by global > reclaim. This is an aging inversion, not unlike if cgroups were to > reclaim directly from the active lists while there are inactive pages. Thanks for pointing this out, makes sense to me. > > Proactive reclaim is another factor. The goal of that it is to offload > colder pages from expensive RAM to cheaper storage. When lower tier > memory is available as an intermediate layer, we want offloading to > take advantage of it instead of bypassing to storage. > > Revert the patch so that cgroups respect the LRU order spanning the > memory hierarchy. > > Of note is a specific undercommit scenario, where all cgroup limits in > the system add up to <= available toptier memory. In that case, > shuffling pages out to lower tiers first to reclaim them from there is > inefficient. This is something could be optimized/short-circuited > later on (although care must be taken not to accidentally recreate the > aging inversion). Let's ensure correctness first. Some side effects we might keep an eye with this revert: - Limit reclaim may experience longer latency since it has to do demotion + reclaim to uncharge enough memory - Higher max usage due to the force charge from migration (of course other migrations, i.e. NUMA fault, could have similar effect, but anyway one more contributing factor) They may not be noticeable hopefully, but I tend to agree that keeping aging correct may be more important. Reviewed-by: Yang Shi > > Signed-off-by: Johannes Weiner > Cc: Dave Hansen > Cc: "Huang, Ying" > Cc: Yang Shi > Cc: Zi Yan > Cc: Michal Hocko > Cc: Shakeel Butt > Cc: Roman Gushchin > --- > mm/vmscan.c | 9 ++------- > 1 file changed, 2 insertions(+), 7 deletions(-) > > diff --git a/mm/vmscan.c b/mm/vmscan.c > index c6918fff06e1..7a4090712177 100644 > --- a/mm/vmscan.c > +++ b/mm/vmscan.c > @@ -528,13 +528,8 @@ static bool can_demote(int nid, struct scan_control *sc) > { > if (!numa_demotion_enabled) > return false; > - if (sc) { > - if (sc->no_demotion) > - return false; > - /* It is pointless to do demotion in memcg reclaim */ > - if (cgroup_reclaim(sc)) > - return false; > - } > + if (sc && sc->no_demotion) > + return false; > if (next_demotion_node(nid) == NUMA_NO_NODE) > return false; > > -- > 2.36.1 > >