From: Roman Gushchin <roman.gushchin@linux.dev>
To: Yosry Ahmed <yosryahmed@google.com>
Cc: Shakeel Butt <shakeelb@google.com>,
Johannes Weiner <hannes@cmpxchg.org>,
Michal Hocko <mhocko@suse.com>, Yu Zhao <yuzhao@google.com>,
Muchun Song <songmuchun@bytedance.com>,
"Matthew Wilcox (Oracle)" <willy@infradead.org>,
Vasily Averin <vasily.averin@linux.dev>,
Vlastimil Babka <vbabka@suse.cz>,
Chris Down <chris@chrisdown.name>,
linux-kernel@vger.kernel.org, linux-mm@kvack.org
Subject: Re: [PATCH] mm: memcg: fix stale protection of reclaim target memcg
Date: Tue, 22 Nov 2022 17:26:04 -0800 [thread overview]
Message-ID: <Y312rG5cq/C6a8ef@P9FQF9L96D.corp.robot.car> (raw)
In-Reply-To: <CAJD7tkYfR6Kuq569=0h_crqjpK5cNT_029LuYa-EeCx16gU-6A@mail.gmail.com>
On Tue, Nov 22, 2022 at 04:49:54PM -0800, Yosry Ahmed wrote:
> On Tue, Nov 22, 2022 at 4:45 PM Yosry Ahmed <yosryahmed@google.com> wrote:
> >
> > On Tue, Nov 22, 2022 at 4:37 PM Roman Gushchin <roman.gushchin@linux.dev> wrote:
> > >
> > > On Tue, Nov 22, 2022 at 11:27:21PM +0000, Yosry Ahmed wrote:
> > > > During reclaim, mem_cgroup_calculate_protection() is used to determine
> > > > the effective protection (emin and elow) values of a memcg. The
> > > > protection of the reclaim target is ignored, but we cannot set their
> > > > effective protection to 0 due to a limitation of the current
> > > > implementation (see comment in mem_cgroup_protection()). Instead,
> > > > we leave their effective protection values unchaged, and later ignore it
> > > > in mem_cgroup_protection().
> > > >
> > > > However, mem_cgroup_protection() is called later in
> > > > shrink_lruvec()->get_scan_count(), which is after the
> > > > mem_cgroup_below_{min/low}() checks in shrink_node_memcgs(). As a
> > > > result, the stale effective protection values of the target memcg may
> > > > lead us to skip reclaiming from the target memcg entirely, before
> > > > calling shrink_lruvec(). This can be even worse with recursive
> > > > protection, where the stale target memcg protection can be higher than
> > > > its standalone protection.
> > > >
> > > > An example where this can happen is as follows. Consider the following
> > > > hierarchy with memory_recursiveprot:
> > > > ROOT
> > > > |
> > > > A (memory.min = 50M)
> > > > |
> > > > B (memory.min = 10M, memory.high = 40M)
> > > >
> > > > Consider the following scenarion:
> > > > - B has memory.current = 35M.
> > > > - The system undergoes global reclaim (target memcg is NULL).
> > > > - B will have an effective min of 50M (all of A's unclaimed protection).
> > > > - B will not be reclaimed from.
> > > > - Now allocate 10M more memory in B, pushing it above it's high limit.
> > > > - The system undergoes memcg reclaim from B (target memcg is B)
> > > > - In shrink_node_memcgs(), we call mem_cgroup_calculate_protection(),
> > > > which immediately returns for B without doing anything, as B is the
> > > > target memcg, relying on mem_cgroup_protection() to ignore B's stale
> > > > effective min (still 50M).
> > > > - Directly after mem_cgroup_calculate_protection(), we will call
> > > > mem_cgroup_below_min(), which will read the stale effective min for B
> > > > and skip it (instead of ignoring its protection as intended). In this
> > > > case, it's really bad because we are not just considering B's
> > > > standalone protection (10M), but we are reading a much higher stale
> > > > protection (50M) which will cause us to not reclaim from B at all.
> > > >
> > > > This is an artifact of commit 45c7f7e1ef17 ("mm, memcg: decouple
> > > > e{low,min} state mutations from protection checks") which made
> > > > mem_cgroup_calculate_protection() only change the state without
> > > > returning any value. Before that commit, we used to return
> > > > MEMCG_PROT_NONE for the target memcg, which would cause us to skip the
> > > > mem_cgroup_below_{min/low}() checks. After that commit we do not return
> > > > anything and we end up checking the min & low effective protections for
> > > > the target memcg, which are stale.
> > > >
> > > > Add mem_cgroup_ignore_protection() that checks if we are reclaiming from
> > > > the target memcg, and call it in mem_cgroup_below_{min/low}() to ignore
> > > > the stale protection of the target memcg.
> > > >
> > > > Fixes: 45c7f7e1ef17 ("mm, memcg: decouple e{low,min} state mutations from protection checks")
> > > > Signed-off-by: Yosry Ahmed <yosryahmed@google.com>
> > >
> > > Great catch!
> > > The fix looks good to me, only a couple of cosmetic suggestions.
> > >
> > > > ---
> > > > include/linux/memcontrol.h | 33 +++++++++++++++++++++++++++------
> > > > mm/vmscan.c | 11 ++++++-----
> > > > 2 files changed, 33 insertions(+), 11 deletions(-)
> > > >
> > > > diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
> > > > index e1644a24009c..22c9c9f9c6b1 100644
> > > > --- a/include/linux/memcontrol.h
> > > > +++ b/include/linux/memcontrol.h
> > > > @@ -625,18 +625,32 @@ static inline bool mem_cgroup_supports_protection(struct mem_cgroup *memcg)
> > > >
> > > > }
> > > >
> > > > -static inline bool mem_cgroup_below_low(struct mem_cgroup *memcg)
> > > > +static inline bool mem_cgroup_ignore_protection(struct mem_cgroup *target,
> > > > + struct mem_cgroup *memcg)
> > > > {
> > > > - if (!mem_cgroup_supports_protection(memcg))
> > >
> > > How about to merge mem_cgroup_supports_protection() and your new helper into
> > > something like mem_cgroup_possibly_protected()? It seems like they never used
> > > separately and unlikely ever will be used.
> >
> > Sounds good! I am thinking maybe mem_cgroup_no_protection() which is
> > an inlining of !mem_cgroup_supports_protection() ||
> > mem_cgorup_ignore_protection().
> >
> > > Also, I'd swap target and memcg arguments.
> >
> > Sounds good.
>
> I just remembered, the reason I put "target" first is to match the
> ordering of mem_cgroup_calculate_protection(), otherwise the code in
> shrink_node_memcgs() may be confusing.
Oh, I see...
Nevermind, let's leave it the way it is now.
Thanks for checking it out!
Roman
next prev parent reply other threads:[~2022-11-23 1:26 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-11-22 23:27 Yosry Ahmed
2022-11-22 23:31 ` Yosry Ahmed
2022-11-23 0:37 ` Roman Gushchin
2022-11-23 0:45 ` Yosry Ahmed
2022-11-23 0:49 ` Yosry Ahmed
2022-11-23 1:26 ` Roman Gushchin [this message]
2022-11-23 9:25 ` Yosry Ahmed
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Y312rG5cq/C6a8ef@P9FQF9L96D.corp.robot.car \
--to=roman.gushchin@linux.dev \
--cc=chris@chrisdown.name \
--cc=hannes@cmpxchg.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@suse.com \
--cc=shakeelb@google.com \
--cc=songmuchun@bytedance.com \
--cc=vasily.averin@linux.dev \
--cc=vbabka@suse.cz \
--cc=willy@infradead.org \
--cc=yosryahmed@google.com \
--cc=yuzhao@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox