From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 201ADC433FE for ; Thu, 24 Nov 2022 00:40:25 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 60D486B0071; Wed, 23 Nov 2022 19:40:25 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 5BD4B6B0072; Wed, 23 Nov 2022 19:40:25 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 486996B0074; Wed, 23 Nov 2022 19:40:25 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 35EBD6B0071 for ; Wed, 23 Nov 2022 19:40:25 -0500 (EST) Received: from smtpin10.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id F1F4B40A77 for ; Thu, 24 Nov 2022 00:40:24 +0000 (UTC) X-FDA: 80166479568.10.2432DD7 Received: from out2.migadu.com (out2.migadu.com [188.165.223.204]) by imf15.hostedemail.com (Postfix) with ESMTP id 5056AA0009 for ; Thu, 24 Nov 2022 00:40:23 +0000 (UTC) Date: Wed, 23 Nov 2022 16:40:16 -0800 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1669250421; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=dwKCZVTaet+r/Ao/ZZyGQNy9e/1DBVJOZrEH4BYHHXA=; b=DvDw1lgMAEhgIPzyJDHrphzocWaHSHpaxMoAJWxNDG0JB9f5yrrepPUFBpHfslDY5exXP8 4CGbgE2B2ARew1YWgwca686mUrPI6sIUHWKQcHpY6Of5uj4LHsumDVEAO1Qm2ODRfcqi8R jSrVtVSzhwB/15z9GneY/ZSneiQAbKQ= X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Roman Gushchin To: Yosry Ahmed Cc: Shakeel Butt , Johannes Weiner , Michal Hocko , Yu Zhao , Muchun Song , "Matthew Wilcox (Oracle)" , Vasily Averin , Vlastimil Babka , Chris Down , linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: Re: [PATCH v2 1/3] mm: memcg: fix stale protection of reclaim target memcg Message-ID: References: <20221123092132.2521764-1-yosryahmed@google.com> <20221123092132.2521764-2-yosryahmed@google.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20221123092132.2521764-2-yosryahmed@google.com> X-Migadu-Flow: FLOW_OUT ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1669250423; a=rsa-sha256; cv=none; b=ELOickHSsgZkPRUUF/2lzVDg032LGshxszjKpfqI3RBsCrgMAmmn05pijgWdiAvzDgzMdB AOPMqSqlibKeMKNncZ9JtkXhy2dzkxcnKLi2mjX8apl1j2NEYSqdunTDBEsQ22HGn4Tsnl 0Xx1yp7yVS+PugJjg/VwmD3d8wJSznA= ARC-Authentication-Results: i=1; imf15.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=DvDw1lgM; spf=pass (imf15.hostedemail.com: domain of roman.gushchin@linux.dev designates 188.165.223.204 as permitted sender) smtp.mailfrom=roman.gushchin@linux.dev; dmarc=pass (policy=none) header.from=linux.dev ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1669250423; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=dwKCZVTaet+r/Ao/ZZyGQNy9e/1DBVJOZrEH4BYHHXA=; b=4RZvAsDCER1en3mDUkVnFfSwXv4SGh3b7mg0e79+viTm3MwVoJc5ZvZMoKPd4PeO0u+URT O2Ly9S6VXKNifmqeDwg/z/fE6Tia2xhSiPCEkz2yCPGalU/OeWsRXY+QcsKkgYPcW9QN1l I+FsRnKUCP+otqKxBjK0n0mzJr0/sKI= X-Stat-Signature: 7rankdmh6kne1o53s8r8epf9k1m6sihs X-Rspamd-Queue-Id: 5056AA0009 X-Rspam-User: Authentication-Results: imf15.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=DvDw1lgM; spf=pass (imf15.hostedemail.com: domain of roman.gushchin@linux.dev designates 188.165.223.204 as permitted sender) smtp.mailfrom=roman.gushchin@linux.dev; dmarc=pass (policy=none) header.from=linux.dev X-Rspamd-Server: rspam02 X-HE-Tag: 1669250423-464805 X-Bogosity: Ham, tests=bogofilter, spamicity=0.006065, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed, Nov 23, 2022 at 09:21:30AM +0000, Yosry Ahmed wrote: > During reclaim, mem_cgroup_calculate_protection() is used to determine > the effective protection (emin and elow) values of a memcg. The > protection of the reclaim target is ignored, but we cannot set their > effective protection to 0 due to a limitation of the current > implementation (see comment in mem_cgroup_protection()). Instead, > we leave their effective protection values unchaged, and later ignore it > in mem_cgroup_protection(). > > However, mem_cgroup_protection() is called later in > shrink_lruvec()->get_scan_count(), which is after the > mem_cgroup_below_{min/low}() checks in shrink_node_memcgs(). As a > result, the stale effective protection values of the target memcg may > lead us to skip reclaiming from the target memcg entirely, before > calling shrink_lruvec(). This can be even worse with recursive > protection, where the stale target memcg protection can be higher than > its standalone protection. See two examples below (a similar version of > example (a) is added to test_memcontrol in a later patch). > > (a) A simple example with proactive reclaim is as follows. Consider the > following hierarchy: > ROOT > | > A > | > B (memory.min = 10M) > > Consider the following scenario: > - B has memory.current = 10M. > - The system undergoes global reclaim (or memcg reclaim in A). > - In shrink_node_memcgs(): > - mem_cgroup_calculate_protection() calculates the effective min (emin) > of B as 10M. > - mem_cgroup_below_min() returns true for B, we do not reclaim from B. > - Now if we want to reclaim 5M from B using proactive reclaim > (memory.reclaim), we should be able to, as the protection of the > target memcg should be ignored. > - In shrink_node_memcgs(): > - mem_cgroup_calculate_protection() immediately returns for B without > doing anything, as B is the target memcg, relying on > mem_cgroup_protection() to ignore B's stale effective min (still 10M). > - mem_cgroup_below_min() reads the stale effective min for B and we > skip it instead of ignoring its protection as intended, as we never > reach mem_cgroup_protection(). > > (b) An more complex example with recursive protection is as follows. > Consider the following hierarchy with memory_recursiveprot: > ROOT > | > A (memory.min = 50M) > | > B (memory.min = 10M, memory.high = 40M) > > Consider the following scenario: > - B has memory.current = 35M. > - The system undergoes global reclaim (target memcg is NULL). > - B will have an effective min of 50M (all of A's unclaimed protection). > - B will not be reclaimed from. > - Now allocate 10M more memory in B, pushing it above it's high limit. > - The system undergoes memcg reclaim from B (target memcg is B). > - Like example (a), we do nothing in mem_cgroup_calculate_protection(), > then call mem_cgroup_below_min(), which will read the stale effective > min for B (50M) and skip it. In this case, it's even worse because we > are not just considering B's standalone protection (10M), but we are > reading a much higher stale protection (50M) which will cause us to not > reclaim from B at all. > > This is an artifact of commit 45c7f7e1ef17 ("mm, memcg: decouple > e{low,min} state mutations from protection checks") which made > mem_cgroup_calculate_protection() only change the state without > returning any value. Before that commit, we used to return > MEMCG_PROT_NONE for the target memcg, which would cause us to skip the > mem_cgroup_below_{min/low}() checks. After that commit we do not return > anything and we end up checking the min & low effective protections for > the target memcg, which are stale. > > Update mem_cgroup_supports_protection() to also check if we are > reclaiming from the target, and rename it to mem_cgroup_unprotected() > (now returns true if we should not protect the memcg, much simpler logic). > > Fixes: 45c7f7e1ef17 ("mm, memcg: decouple e{low,min} state mutations from protection checks") > Signed-off-by: Yosry Ahmed Reviewed-by: Roman Gushchin Thank you!