From: "Bruno Prémont" <bonbons@linux-vserver.org>
To: Michal Hocko <mhocko@suse.com>
Cc: Yafang Shao <laoar.shao@gmail.com>,
Chris Down <chris@chrisdown.name>,
Johannes Weiner <hannes@cmpxchg.org>,
cgroups@vger.kernel.org, linux-mm@kvack.org,
Vladimir Davydov <vdavydov.dev@gmail.com>
Subject: Re: Regression from 5.7.17 to 5.9.9 with memory.low cgroup constraints
Date: Wed, 25 Nov 2020 15:33:50 +0100 [thread overview]
Message-ID: <20201125153350.0af98d93@hemera> (raw)
In-Reply-To: <20201125133740.GE31550@dhcp22.suse.cz>
Hi Michal,
On Wed, 25 Nov 2020 14:37:40 +0100 Michal Hocko <mhocko@suse.com> wrote:
> Hi,
> thanks for the detailed report.
>
> On Wed 25-11-20 12:39:56, Bruno Prémont wrote:
> [...]
> > Did memory.low meaning change between 5.7 and 5.9?
>
> The latest semantic change in the low limit protection semantic was
> introduced in 5.7 (recursive protection) but it requires an explicit
> enablinig.
No specific mount options set for v2 cgroup, so not active.
> > From behavior it
> > feels as if inodes are not accounted to cgroup at all and kernel pushes
> > cgroups down to their memory.low by killing file cache if there is not
> > enough free memory to hold all promises (and not only when a cgroup
> > tries to use up to its promised amount of memory).
>
> Your counters indeed show that the low protection has been breached,
> most likely because the reclaim couldn't make any progress. Considering
> that this is the case for all/most of your cgroups it suggests that the
> memory pressure was global rather than limit imposed. In fact even top
> level cgroups got reclaimed below the low limit.
Note that the "original" counters we partially triggered by a first
event where I had one cgroup (websrv) of the with a rather very high
memory.low (16G or even 32G) which caused counters everywhere to
increase.
So before the last trashing during which the values were collected the
event counters and `current` looked as follows:
system/memory.pressure
some avg10=0.04 avg60=0.28 avg300=0.12 total=5844917510
full avg10=0.04 avg60=0.26 avg300=0.11 total=2439353404
system/memory.current
96432128
system/memory.events.local
low 5399469 (unchanged)
high 0
max 112303 (unchanged)
oom 0
oom_kill 0
system/base/memory.pressure
some avg10=0.04 avg60=0.28 avg300=0.12 total=4589562039
full avg10=0.04 avg60=0.28 avg300=0.12 total=1926984197
system/base/memory.current
59305984
system/base/memory.events.local
low 0 (unchanged)
high 0
max 0 (unchanged)
oom 0
oom_kill 0
system/backup/memory.pressure
some avg10=0.00 avg60=0.00 avg300=0.00 total=2123293649
full avg10=0.00 avg60=0.00 avg300=0.00 total=815450446
system/backup/memory.current
32444416
system/backup/memory.events.local
low 5446 (unchanged)
high 0
max 0
oom 0
oom_kill 0
system/shell/memory.pressure
some avg10=0.00 avg60=0.00 avg300=0.00 total=1345965660
full avg10=0.00 avg60=0.00 avg300=0.00 total=492812915
system/shell/memory.current
4571136
system/shell/memory.events.local
low 0
high 0
max 0
oom 0
oom_kill 0
website/memory.pressure
some avg10=0.00 avg60=0.00 avg300=0.00 total=415008878
full avg10=0.00 avg60=0.00 avg300=0.00 total=201868483
website/memory.current
12104380416
website/memory.events.local
low 11264569 (during trashing: 11372142 then 11377350)
high 0
max 0
oom 0
oom_kill 0
remote/memory.pressure
some avg10=0.00 avg60=0.00 avg300=0.00 total=2005130126
full avg10=0.00 avg60=0.00 avg300=0.00 total=735366752
remote/memory.current
116330496
remote/memory.events.local
low 11264569 (during trashing: 11372142 then 11377350)
high 0
max 0
oom 0
oom_kill 0
websrv/memory.pressure
some avg10=0.02 avg60=0.11 avg300=0.03 total=6650355162
full avg10=0.02 avg60=0.11 avg300=0.03 total=2034584579
websrv/memory.current
18483359744
websrv/memory.events.local
low 0
high 0
max 0
oom 0
oom_kill 0
> This suggests that this is not likely to be memcg specific. It is
> more likely that this is a general memory reclaim regression for your
> workload. There were larger changes in that area. Be it lru balancing
> based on cost model by Johannes or working set tracking for anonymous
> pages by Joonsoo. Maybe even more. Both of them can influence page cache
> reclaim but you are suggesting that slab accounted memory is not
> reclaimed properly.
That is my impression, yes. No idea though if memcg can influence the
way reclaim tries to perform its work or if slab_reclaimable not
associated to any (child) cg would somehow be excluded from reclaim.
> I am not sure sure there were considerable changes
> there. Would it be possible to collect /prov/vmstat as well?
I will have a look at gathering memory.stat and /proc/vmstat at next
opportunity.
Will first try with a test system with not too much memory and lots of
files to reproduce about 50% of memory usage by slab_reclaimable and
see how far I get.
Thanks,
Bruno
next prev parent reply other threads:[~2020-11-25 14:33 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-11-25 11:39 Bruno Prémont
2020-11-25 13:37 ` Michal Hocko
2020-11-25 14:33 ` Bruno Prémont [this message]
2020-11-25 18:21 ` Roman Gushchin
2020-12-03 11:09 ` Bruno Prémont
2020-12-03 20:55 ` Roman Gushchin
2020-12-06 11:30 ` Bruno Prémont
2020-12-10 11:08 ` Bruno Prémont
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20201125153350.0af98d93@hemera \
--to=bonbons@linux-vserver.org \
--cc=cgroups@vger.kernel.org \
--cc=chris@chrisdown.name \
--cc=hannes@cmpxchg.org \
--cc=laoar.shao@gmail.com \
--cc=linux-mm@kvack.org \
--cc=mhocko@suse.com \
--cc=vdavydov.dev@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox