From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0AC92C56202 for ; Wed, 25 Nov 2020 14:33:58 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id DC68E20789 for ; Wed, 25 Nov 2020 14:33:56 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org DC68E20789 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=linux-vserver.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 09F9E6B0074; Wed, 25 Nov 2020 09:33:56 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 04EAD6B0075; Wed, 25 Nov 2020 09:33:55 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id EA6E56B0078; Wed, 25 Nov 2020 09:33:55 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0021.hostedemail.com [216.40.44.21]) by kanga.kvack.org (Postfix) with ESMTP id D23456B0074 for ; Wed, 25 Nov 2020 09:33:55 -0500 (EST) Received: from smtpin16.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 98BA7181AEF30 for ; Wed, 25 Nov 2020 14:33:55 +0000 (UTC) X-FDA: 77523184830.16.farm49_080227627377 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin16.hostedemail.com (Postfix) with ESMTP id 7608C100E6903 for ; Wed, 25 Nov 2020 14:33:55 +0000 (UTC) X-HE-Tag: farm49_080227627377 X-Filterd-Recvd-Size: 5930 Received: from smtprelay.restena.lu (smtprelay.restena.lu [158.64.1.62]) by imf31.hostedemail.com (Postfix) with ESMTP for ; Wed, 25 Nov 2020 14:33:52 +0000 (UTC) Received: from hemera (unknown [IPv6:2001:a18:1:10:fa75:a4ff:fe28:fe3a]) by smtprelay.restena.lu (Postfix) with ESMTPS id D26F442C1E; Wed, 25 Nov 2020 15:33:50 +0100 (CET) Date: Wed, 25 Nov 2020 15:33:50 +0100 From: Bruno =?UTF-8?B?UHLDqW1vbnQ=?= To: Michal Hocko Cc: Yafang Shao , Chris Down , Johannes Weiner , cgroups@vger.kernel.org, linux-mm@kvack.org, Vladimir Davydov Subject: Re: Regression from 5.7.17 to 5.9.9 with memory.low cgroup constraints Message-ID: <20201125153350.0af98d93@hemera> In-Reply-To: <20201125133740.GE31550@dhcp22.suse.cz> References: <20201125123956.61d9e16a@hemera> <20201125133740.GE31550@dhcp22.suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Hi Michal, On Wed, 25 Nov 2020 14:37:40 +0100 Michal Hocko wrote: > Hi, > thanks for the detailed report. >=20 > On Wed 25-11-20 12:39:56, Bruno Pr=C3=A9mont wrote: > [...] > > Did memory.low meaning change between 5.7 and 5.9? =20 >=20 > The latest semantic change in the low limit protection semantic was > introduced in 5.7 (recursive protection) but it requires an explicit > enablinig. No specific mount options set for v2 cgroup, so not active. > > From behavior it > > feels as if inodes are not accounted to cgroup at all and kernel pushes > > cgroups down to their memory.low by killing file cache if there is not > > enough free memory to hold all promises (and not only when a cgroup > > tries to use up to its promised amount of memory). =20 >=20 > Your counters indeed show that the low protection has been breached, > most likely because the reclaim couldn't make any progress. Considering > that this is the case for all/most of your cgroups it suggests that the > memory pressure was global rather than limit imposed. In fact even top > level cgroups got reclaimed below the low limit. Note that the "original" counters we partially triggered by a first event where I had one cgroup (websrv) of the with a rather very high memory.low (16G or even 32G) which caused counters everywhere to increase. So before the last trashing during which the values were collected the event counters and `current` looked as follows: system/memory.pressure some avg10=3D0.04 avg60=3D0.28 avg300=3D0.12 total=3D5844917510 full avg10=3D0.04 avg60=3D0.26 avg300=3D0.11 total=3D2439353404 system/memory.current 96432128 system/memory.events.local low 5399469 (unchanged) high 0 max 112303 (unchanged) oom 0 oom_kill 0 system/base/memory.pressure some avg10=3D0.04 avg60=3D0.28 avg300=3D0.12 total=3D4589562039 full avg10=3D0.04 avg60=3D0.28 avg300=3D0.12 total=3D1926984197 system/base/memory.current 59305984 system/base/memory.events.local low 0 (unchanged) high 0 max 0 (unchanged) oom 0 oom_kill 0 system/backup/memory.pressure some avg10=3D0.00 avg60=3D0.00 avg300=3D0.00 total=3D2123293649 full avg10=3D0.00 avg60=3D0.00 avg300=3D0.00 total=3D815450446 system/backup/memory.current 32444416 system/backup/memory.events.local low 5446 (unchanged) high 0 max 0 oom 0 oom_kill 0 system/shell/memory.pressure some avg10=3D0.00 avg60=3D0.00 avg300=3D0.00 total=3D1345965660 full avg10=3D0.00 avg60=3D0.00 avg300=3D0.00 total=3D492812915 system/shell/memory.current 4571136 system/shell/memory.events.local low 0 high 0 max 0 oom 0 oom_kill 0 website/memory.pressure some avg10=3D0.00 avg60=3D0.00 avg300=3D0.00 total=3D415008878 full avg10=3D0.00 avg60=3D0.00 avg300=3D0.00 total=3D201868483 website/memory.current 12104380416 website/memory.events.local low 11264569 (during trashing: 11372142 then 11377350) high 0 max 0 oom 0 oom_kill 0 remote/memory.pressure some avg10=3D0.00 avg60=3D0.00 avg300=3D0.00 total=3D2005130126 full avg10=3D0.00 avg60=3D0.00 avg300=3D0.00 total=3D735366752 remote/memory.current 116330496 remote/memory.events.local low 11264569 (during trashing: 11372142 then 11377350) high 0 max 0 oom 0 oom_kill 0 websrv/memory.pressure some avg10=3D0.02 avg60=3D0.11 avg300=3D0.03 total=3D6650355162 full avg10=3D0.02 avg60=3D0.11 avg300=3D0.03 total=3D2034584579 websrv/memory.current 18483359744 websrv/memory.events.local low 0 high 0 max 0 oom 0 oom_kill 0 > This suggests that this is not likely to be memcg specific. It is > more likely that this is a general memory reclaim regression for your > workload. There were larger changes in that area. Be it lru balancing > based on cost model by Johannes or working set tracking for anonymous > pages by Joonsoo. Maybe even more. Both of them can influence page cache > reclaim but you are suggesting that slab accounted memory is not > reclaimed properly. That is my impression, yes. No idea though if memcg can influence the way reclaim tries to perform its work or if slab_reclaimable not associated to any (child) cg would somehow be excluded from reclaim. > I am not sure sure there were considerable changes > there. Would it be possible to collect /prov/vmstat as well? I will have a look at gathering memory.stat and /proc/vmstat at next opportunity. Will first try with a test system with not too much memory and lots of files to reproduce about 50% of memory usage by slab_reclaimable and see how far I get. Thanks, Bruno