From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8D60AC433FE for ; Wed, 23 Nov 2022 00:50:32 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 117AC6B0078; Tue, 22 Nov 2022 19:50:32 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 0C7C66B007B; Tue, 22 Nov 2022 19:50:32 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id ED1ED8E0001; Tue, 22 Nov 2022 19:50:31 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id D9FD06B0078 for ; Tue, 22 Nov 2022 19:50:31 -0500 (EST) Received: from smtpin09.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id AA155A0484 for ; Wed, 23 Nov 2022 00:50:31 +0000 (UTC) X-FDA: 80162876262.09.86B5EC6 Received: from mail-io1-f52.google.com (mail-io1-f52.google.com [209.85.166.52]) by imf12.hostedemail.com (Postfix) with ESMTP id 60AE34000B for ; Wed, 23 Nov 2022 00:50:31 +0000 (UTC) Received: by mail-io1-f52.google.com with SMTP id d123so12197901iof.7 for ; Tue, 22 Nov 2022 16:50:31 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=/Wyg27Oeq4DqDZ4lYFkdjF7i3NTxjJy7JLvTsaNTj88=; b=UUX2Pzj4rv7SI7AC9kHHtblwdjLwJVfefYk4Y6ryDlKu37g9yCHeEiMGLn6idfv7wU yQbWigJuJJzjd6jybOPBUF2+XiTcg3zciQmhCsyfq8PYYL4oWafZEpcumlBPvd6DCOyG 3dZJMWaKJK+TBQ4lPySTlMaP5lzcZzKww1QJEKI/nyFoEM94VPEFj32LfVt56x7NLXx5 AA5xjysQ388tTDD2naOZXfCW86lu/yHEoX1ZNdlKfVRrkEMZQxkEITMFY/7RX0Wz5Wn6 AlVOzAyOVSHD8VN47gc3O2QN3R1Om4FvSrTbP3Wn3KAr3Eln0hF8OHYkH1CzYd7DyOQP ndrA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=/Wyg27Oeq4DqDZ4lYFkdjF7i3NTxjJy7JLvTsaNTj88=; b=C8fnwRR3uaVBHiU7z8APC2dM96j+55QtV09MiaqBDvKCqMxn6ki65VYWH+8W9axQ9p LVv4IKn1GEkcoBs+tlhu3T0ACiIlFbAWh/obq8x1TaHJDpvaE1OGn6tqNqsD5mG9eMMa TGjsblm1AmSdfFXV/FnDDiFZBsuOCs6haPu3apat8DX19e/D6zCVxhk71G2pEhNYy5yH cgSMTShjBuYdZQJg74lhEp8rkdtxhp11n3e6Mx0joQVQFAM20E0DQIhKdJOul2eqY5iH 8D5gyOUXXW32DjDyo+cq30kn12V5LXY7plfCSImbikOQB5DXyt/csA81CcRaOTBmuHnV nuQA== X-Gm-Message-State: ANoB5pmg0DnsD0mdE5IY1OTLVxaauCX8QvFttb3Uk4lLcQID2mWvzT6g XAe2iBab0wEwL/VEy5GAABj/fpfc70i9nNLGXHva/Q== X-Google-Smtp-Source: AA0mqf4NR6Es1jcFfkVDVEhHLmRW3Glm/E4ZOd5g9Aph6VfZkmnmoWCGfEb8L2jymCL355X1qCogkyfn+5WCfjAu13Y= X-Received: by 2002:a02:2ace:0:b0:363:a77e:369e with SMTP id w197-20020a022ace000000b00363a77e369emr11957258jaw.53.1669164630472; Tue, 22 Nov 2022 16:50:30 -0800 (PST) MIME-Version: 1.0 References: <20221122232721.2306102-1-yosryahmed@google.com> In-Reply-To: From: Yosry Ahmed Date: Tue, 22 Nov 2022 16:49:54 -0800 Message-ID: Subject: Re: [PATCH] mm: memcg: fix stale protection of reclaim target memcg To: Roman Gushchin Cc: Shakeel Butt , Johannes Weiner , Michal Hocko , Yu Zhao , Muchun Song , "Matthew Wilcox (Oracle)" , Vasily Averin , Vlastimil Babka , Chris Down , linux-kernel@vger.kernel.org, linux-mm@kvack.org Content-Type: text/plain; charset="UTF-8" ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1669164631; a=rsa-sha256; cv=none; b=Il7kj1v2f1hpH+0CJigPZac5RJPOjD2YJbqppY38xkAgvx96E5DrgAtUEUy2q3hjz6S3Bu NvyxhGuvYfDbTD5x8Go7ywerruydbKKgdYozg72a1VGtblM9FPNXlX5aMVITs5C9Hw6jUx pNpuXsWKkPjY1K/ZSNBlzc8bGy1CRmw= ARC-Authentication-Results: i=1; imf12.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=UUX2Pzj4; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf12.hostedemail.com: domain of yosryahmed@google.com designates 209.85.166.52 as permitted sender) smtp.mailfrom=yosryahmed@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1669164631; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=/Wyg27Oeq4DqDZ4lYFkdjF7i3NTxjJy7JLvTsaNTj88=; b=JIOF60VtL1eXVOp/Y7tpIqMmPupSeYXgCtVv3qXhf1t8t1NCHDlPx+pcA6MVN2Tus3viTM 5yehHHi2Tlh0yi47LJgCmcNVKwGiNcGaEk9jJ+Zrj/cRpea6pDZDZ1XZ3oa5FT01kSTWPT QASAYZ0h5p/vbtRg4XbrqGWRBbm+7u8= X-Stat-Signature: cberpp6yj4nsqg5kqxdiphhpw8xebyhq X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: 60AE34000B Authentication-Results: imf12.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=UUX2Pzj4; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf12.hostedemail.com: domain of yosryahmed@google.com designates 209.85.166.52 as permitted sender) smtp.mailfrom=yosryahmed@google.com X-Rspam-User: X-HE-Tag: 1669164631-485274 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000129, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, Nov 22, 2022 at 4:45 PM Yosry Ahmed wrote: > > On Tue, Nov 22, 2022 at 4:37 PM Roman Gushchin wrote: > > > > On Tue, Nov 22, 2022 at 11:27:21PM +0000, Yosry Ahmed wrote: > > > During reclaim, mem_cgroup_calculate_protection() is used to determine > > > the effective protection (emin and elow) values of a memcg. The > > > protection of the reclaim target is ignored, but we cannot set their > > > effective protection to 0 due to a limitation of the current > > > implementation (see comment in mem_cgroup_protection()). Instead, > > > we leave their effective protection values unchaged, and later ignore it > > > in mem_cgroup_protection(). > > > > > > However, mem_cgroup_protection() is called later in > > > shrink_lruvec()->get_scan_count(), which is after the > > > mem_cgroup_below_{min/low}() checks in shrink_node_memcgs(). As a > > > result, the stale effective protection values of the target memcg may > > > lead us to skip reclaiming from the target memcg entirely, before > > > calling shrink_lruvec(). This can be even worse with recursive > > > protection, where the stale target memcg protection can be higher than > > > its standalone protection. > > > > > > An example where this can happen is as follows. Consider the following > > > hierarchy with memory_recursiveprot: > > > ROOT > > > | > > > A (memory.min = 50M) > > > | > > > B (memory.min = 10M, memory.high = 40M) > > > > > > Consider the following scenarion: > > > - B has memory.current = 35M. > > > - The system undergoes global reclaim (target memcg is NULL). > > > - B will have an effective min of 50M (all of A's unclaimed protection). > > > - B will not be reclaimed from. > > > - Now allocate 10M more memory in B, pushing it above it's high limit. > > > - The system undergoes memcg reclaim from B (target memcg is B) > > > - In shrink_node_memcgs(), we call mem_cgroup_calculate_protection(), > > > which immediately returns for B without doing anything, as B is the > > > target memcg, relying on mem_cgroup_protection() to ignore B's stale > > > effective min (still 50M). > > > - Directly after mem_cgroup_calculate_protection(), we will call > > > mem_cgroup_below_min(), which will read the stale effective min for B > > > and skip it (instead of ignoring its protection as intended). In this > > > case, it's really bad because we are not just considering B's > > > standalone protection (10M), but we are reading a much higher stale > > > protection (50M) which will cause us to not reclaim from B at all. > > > > > > This is an artifact of commit 45c7f7e1ef17 ("mm, memcg: decouple > > > e{low,min} state mutations from protection checks") which made > > > mem_cgroup_calculate_protection() only change the state without > > > returning any value. Before that commit, we used to return > > > MEMCG_PROT_NONE for the target memcg, which would cause us to skip the > > > mem_cgroup_below_{min/low}() checks. After that commit we do not return > > > anything and we end up checking the min & low effective protections for > > > the target memcg, which are stale. > > > > > > Add mem_cgroup_ignore_protection() that checks if we are reclaiming from > > > the target memcg, and call it in mem_cgroup_below_{min/low}() to ignore > > > the stale protection of the target memcg. > > > > > > Fixes: 45c7f7e1ef17 ("mm, memcg: decouple e{low,min} state mutations from protection checks") > > > Signed-off-by: Yosry Ahmed > > > > Great catch! > > The fix looks good to me, only a couple of cosmetic suggestions. > > > > > --- > > > include/linux/memcontrol.h | 33 +++++++++++++++++++++++++++------ > > > mm/vmscan.c | 11 ++++++----- > > > 2 files changed, 33 insertions(+), 11 deletions(-) > > > > > > diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h > > > index e1644a24009c..22c9c9f9c6b1 100644 > > > --- a/include/linux/memcontrol.h > > > +++ b/include/linux/memcontrol.h > > > @@ -625,18 +625,32 @@ static inline bool mem_cgroup_supports_protection(struct mem_cgroup *memcg) > > > > > > } > > > > > > -static inline bool mem_cgroup_below_low(struct mem_cgroup *memcg) > > > +static inline bool mem_cgroup_ignore_protection(struct mem_cgroup *target, > > > + struct mem_cgroup *memcg) > > > { > > > - if (!mem_cgroup_supports_protection(memcg)) > > > > How about to merge mem_cgroup_supports_protection() and your new helper into > > something like mem_cgroup_possibly_protected()? It seems like they never used > > separately and unlikely ever will be used. > > Sounds good! I am thinking maybe mem_cgroup_no_protection() which is > an inlining of !mem_cgroup_supports_protection() || > mem_cgorup_ignore_protection(). > > > Also, I'd swap target and memcg arguments. > > Sounds good. I just remembered, the reason I put "target" first is to match the ordering of mem_cgroup_calculate_protection(), otherwise the code in shrink_node_memcgs() may be confusing. > > > > > Thank you! > > > > > > PS If it's not too hard, please, consider adding a new kselftest to cover this case. > > Thank you! > > I will try to translate my bash test to something in test_memcontrol, > I don't plan to spend a lot of time on it though so I hope it's simple > enough..