From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.6 required=3.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY, SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 64874C54FCB for ; Fri, 24 Apr 2020 16:01:40 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 24D1020706 for ; Fri, 24 Apr 2020 16:01:40 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="vHzZe0VL" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 24D1020706 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id A5AB88E0005; Fri, 24 Apr 2020 12:01:39 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id A0A7C8E0003; Fri, 24 Apr 2020 12:01:39 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8F9408E0005; Fri, 24 Apr 2020 12:01:39 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0103.hostedemail.com [216.40.44.103]) by kanga.kvack.org (Postfix) with ESMTP id 7443C8E0003 for ; Fri, 24 Apr 2020 12:01:39 -0400 (EDT) Received: from smtpin12.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id E3877180AD817 for ; Fri, 24 Apr 2020 16:01:38 +0000 (UTC) X-FDA: 76743213876.12.gold23_e55104f57060 X-HE-Tag: gold23_e55104f57060 X-Filterd-Recvd-Size: 8127 Received: from mail-io1-f68.google.com (mail-io1-f68.google.com [209.85.166.68]) by imf04.hostedemail.com (Postfix) with ESMTP for ; Fri, 24 Apr 2020 16:01:38 +0000 (UTC) Received: by mail-io1-f68.google.com with SMTP id f3so10903389ioj.1 for ; Fri, 24 Apr 2020 09:01:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=6A+MhonSmWKJ8vta63r2FLDB2sShhaIfgrK791Ii/3c=; b=vHzZe0VLo9zZ+RkgHSb7I8G+nRlecaRckFxQPxzX8eKctjij/W8MLszd6t+8J9pkdr 0GfLOrJ6NaJd3Up2o8n+v90lX9jHFQTs6nbyTQs/YDiPq8A5y2tYuYIA4T8njSbfDxC5 fkQLOvq6IJhh25CowZRgNFWXxkMkp9AU5KfNIKmq4h95Nwx0pH/4hcCjdHMSIpr6lERa 5Z4yTg6JUmMLgE52QusXd28/Gx7QUsoajT1Xu2hwqck4T9kcuBNYZiIVz0rj5LWh6n3G fWAEQI5OCY/CC63iBHZMDYv0GUgTJy87sAAWTEJT33uh/GY5c0+0J+zAHpEfDR96O87h bLlw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=6A+MhonSmWKJ8vta63r2FLDB2sShhaIfgrK791Ii/3c=; b=FYaZlXR6U8mZ8zoMgr1eEDb7vzkb7nLqFcIxTLOLbpYvTg//ZVxcxz6eC0kLo03cDQ V3gFwQUrtfKNE26nX9MH7oYnGuh9GKrh6/36n8jbG1s6q6Qp0U3b9sQ4a7ZK2aDjG0r8 8/32Ec3V9wwmwhJIO6KlDPGCcG95N3blKqYIlRs5rMsX8Sh1rXlQFV1UP66g4XnXsCGo kDkWo5WiiZgbK+1imlzbgAputfmXLArbbn+cwlt95gqqxnbZlXeTe+XeOR2ub30x8XAO IIX2FQ2lRa0QI12xCgX7LbUcOnE64b8lFGlcJE+QrSJDx2O0f2K5mkvk5HSPdhHsDJdj opuQ== X-Gm-Message-State: AGi0PuZMMr/UZar6pd2BdTaNg+LhM/09VY8zgphGbQw8IwGlUnh+8gXb HZdj1zjFomqeX2pWMpHrq9vLT8qjx7c5QXk4a9I= X-Google-Smtp-Source: APiQypKc+iUtIXqoiu/p/CcQmYzyVG9ADART+Z2kJVsA4Lm9bbq2f/YaeX0LDezDL2+0kZ/LOdoP8Di2Vg4UN3EU6hU= X-Received: by 2002:a02:5184:: with SMTP id s126mr8780707jaa.81.1587744097389; Fri, 24 Apr 2020 09:01:37 -0700 (PDT) MIME-Version: 1.0 References: <20200423061629.24185-1-laoar.shao@gmail.com> <20200424131450.GA495720@cmpxchg.org> In-Reply-To: <20200424131450.GA495720@cmpxchg.org> From: Yafang Shao Date: Sat, 25 Apr 2020 00:00:59 +0800 Message-ID: Subject: Re: [PATCH] mm, memcg: fix wrong mem cgroup protection To: Johannes Weiner Cc: Andrew Morton , Michal Hocko , Vladimir Davydov , Linux MM , Chris Down , Roman Gushchin , stable@vger.kernel.org Content-Type: text/plain; charset="UTF-8" X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Fri, Apr 24, 2020 at 9:14 PM Johannes Weiner wrote: > > On Thu, Apr 23, 2020 at 02:16:29AM -0400, Yafang Shao wrote: > > This patch is an improvement of a previous version[1], as the previous > > version is not easy to understand. > > This issue persists in the newest kernel, I have to resend the fix. As > > the implementation is changed, I drop Roman's ack from the previous > > version. > > Now that I understand the problem, I much prefer the previous version. > Great news that this issue is understood by one more reviewer. > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index 745697906ce3..2bf91ae1e640 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -6332,8 +6332,19 @@ enum mem_cgroup_protection mem_cgroup_protected(struct mem_cgroup *root, > > if (!root) > root = root_mem_cgroup; > - if (memcg == root) > + if (memcg == root) { > + /* > + * The cgroup is the reclaim root in this reclaim > + * cycle, and therefore not protected. But it may have > + * stale effective protection values from previous > + * cycles in which it was not the reclaim root - for > + * example, global reclaim followed by limit reclaim. > + * Reset these values for mem_cgroup_protection(). > + */ > + memcg->memory.emin = 0; > + memcg->memory.elow = 0; > return MEMCG_PROT_NONE; > + } > > usage = page_counter_read(&memcg->memory); > if (!usage) > > > Here's the explanation of this issue. > > memory.{low,min} won't take effect if the to-be-reclaimed memcg is the > > sc->target_mem_cgroup, that can also be proved by the implementation in > > mem_cgroup_protected(), see bellow, > > mem_cgroup_protected > > if (memcg == root) [2] > > return MEMCG_PROT_NONE; > > > > But this rule is ignored in mem_cgroup_protection(), which will read > > memory.{emin, elow} as the protection whatever the memcg is. > > > > How would this issue happen? > > Because in mem_cgroup_protected() we forget to clear the > > memory.{emin, elow} if the memcg is target_mem_cgroup [2]. > > > > An example to illustrate this issue. > > root_mem_cgroup > > / > > A memory.max: 1024M > > memory.min: 512M > > memory.current: 800M ('current' must be greater than 'min') > > Once kswapd starts to reclaim memcg A, it assigns 512M to memory.emin of A. > > Then kswapd stops. > > As a result of it, the memory values of A will be, > > root_mem_cgroup > > / > > A memory.max: 1024M > > memory.min: 512M > > memory.current: 512M (approximately) > > memory.emin: 512M > > > > Then a new workload starts to run in memcg A, and it will trigger memcg > > relcaim in A soon. As memcg A is the target_mem_cgroup of this > > reclaimer, so it return directly without touching memory.{emin, elow}.[2] > > The memory values of A will be, > > root_mem_cgroup > > / > > A memory.max: 1024M > > memory.min: 512M > > memory.current: 1024M (approximately) > > memory.emin: 512M > > Then this memory.emin will be used in mem_cgroup_protection() to get the > > scan count, which is obvoiusly a wrong scan count. > > > > [1]. https://lore.kernel.org/linux-mm/20200216145249.6900-1-laoar.shao@gmail.com/ > > > > Fixes: 9783aa9917f8 ("mm, memcg: proportional memory.{low,min} reclaim") > > Cc: Chris Down > > Cc: Roman Gushchin > > Cc: stable@vger.kernel.org > > Signed-off-by: Yafang Shao > > As others have noted, it's fairly hard to understand the problem from > the above changelog. How about the following: > > A cgroup can have both memory protection and a memory limit to isolate > it from its siblings in both directions - for example, to prevent it > from being shrunk below 2G under high pressure from outside, but also > from growing beyond 4G under low pressure. > > 9783aa9917f8 ("mm, memcg: proportional memory.{low,min} reclaim") > implemented proportional scan pressure so that multiple siblings in > excess of their protection settings don't get reclaimed equally but > instead in accordance to their unprotected portion. > > During limit reclaim, this proportionality shouldn't apply of course: > there is no competition, all pressure is from within the cgroup and > should be applied as such. Reclaim should operate at full efficiency. > > However, mem_cgroup_protected() never expected anybody to look at the > effective protection values when it indicated that the cgroup is above > its protection. As a result, a query during limit reclaim may return > stale protection values that were calculated by a previous reclaim > cycle in which the cgroup did have siblings. > > When this happens, reclaim is unnecessarily hesitant and potentially > slow to meet the desired limit. In theory this could lead to premature > OOM kills, although it's not obvious this has occurred in practice. Thanks a lot for your improvement on the change log. -- Thanks Yafang