linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Yosry Ahmed <yosryahmed@google.com>
To: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Shakeel Butt <shakeelb@google.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	 Michal Hocko <mhocko@suse.com>, Yu Zhao <yuzhao@google.com>,
	 Muchun Song <songmuchun@bytedance.com>,
	"Matthew Wilcox (Oracle)" <willy@infradead.org>,
	 Vasily Averin <vasily.averin@linux.dev>,
	Vlastimil Babka <vbabka@suse.cz>,
	 Chris Down <chris@chrisdown.name>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org
Subject: Re: [PATCH v2 1/3] mm: memcg: fix stale protection of reclaim target memcg
Date: Wed, 23 Nov 2022 16:57:28 -0800	[thread overview]
Message-ID: <CAJD7tkY4QtVTJe5cxSKzKj0gOROD4a+o=Rt-wfvG1gcxSQC8Pg@mail.gmail.com> (raw)
In-Reply-To: <Y369cNnRWkoymF1G@P9FQF9L96D.corp.robot.car>

On Wed, Nov 23, 2022 at 4:40 PM Roman Gushchin <roman.gushchin@linux.dev> wrote:
>
> On Wed, Nov 23, 2022 at 09:21:30AM +0000, Yosry Ahmed wrote:
> > During reclaim, mem_cgroup_calculate_protection() is used to determine
> > the effective protection (emin and elow) values of a memcg. The
> > protection of the reclaim target is ignored, but we cannot set their
> > effective protection to 0 due to a limitation of the current
> > implementation (see comment in mem_cgroup_protection()). Instead,
> > we leave their effective protection values unchaged, and later ignore it
> > in mem_cgroup_protection().
> >
> > However, mem_cgroup_protection() is called later in
> > shrink_lruvec()->get_scan_count(), which is after the
> > mem_cgroup_below_{min/low}() checks in shrink_node_memcgs(). As a
> > result, the stale effective protection values of the target memcg may
> > lead us to skip reclaiming from the target memcg entirely, before
> > calling shrink_lruvec(). This can be even worse with recursive
> > protection, where the stale target memcg protection can be higher than
> > its standalone protection. See two examples below (a similar version of
> > example (a) is added to test_memcontrol in a later patch).
> >
> > (a) A simple example with proactive reclaim is as follows. Consider the
> > following hierarchy:
> > ROOT
> >  |
> >  A
> >  |
> >  B (memory.min = 10M)
> >
> > Consider the following scenario:
> > - B has memory.current = 10M.
> > - The system undergoes global reclaim (or memcg reclaim in A).
> > - In shrink_node_memcgs():
> >   - mem_cgroup_calculate_protection() calculates the effective min (emin)
> >     of B as 10M.
> >   - mem_cgroup_below_min() returns true for B, we do not reclaim from B.
> > - Now if we want to reclaim 5M from B using proactive reclaim
> >   (memory.reclaim), we should be able to, as the protection of the
> >   target memcg should be ignored.
> > - In shrink_node_memcgs():
> >   - mem_cgroup_calculate_protection() immediately returns for B without
> >     doing anything, as B is the target memcg, relying on
> >     mem_cgroup_protection() to ignore B's stale effective min (still 10M).
> >   - mem_cgroup_below_min() reads the stale effective min for B and we
> >     skip it instead of ignoring its protection as intended, as we never
> >     reach mem_cgroup_protection().
> >
> > (b) An more complex example with recursive protection is as follows.
> > Consider the following hierarchy with memory_recursiveprot:
> > ROOT
> >  |
> >  A (memory.min = 50M)
> >  |
> >  B (memory.min = 10M, memory.high = 40M)
> >
> > Consider the following scenario:
> > - B has memory.current = 35M.
> > - The system undergoes global reclaim (target memcg is NULL).
> > - B will have an effective min of 50M (all of A's unclaimed protection).
> > - B will not be reclaimed from.
> > - Now allocate 10M more memory in B, pushing it above it's high limit.
> > - The system undergoes memcg reclaim from B (target memcg is B).
> > - Like example (a), we do nothing in mem_cgroup_calculate_protection(),
> >   then call mem_cgroup_below_min(), which will read the stale effective
> >   min for B (50M) and skip it. In this case, it's even worse because we
> >   are not just considering B's standalone protection (10M), but we are
> >   reading a much higher stale protection (50M) which will cause us to not
> >   reclaim from B at all.
> >
> > This is an artifact of commit 45c7f7e1ef17 ("mm, memcg: decouple
> > e{low,min} state mutations from protection checks") which made
> > mem_cgroup_calculate_protection() only change the state without
> > returning any value. Before that commit, we used to return
> > MEMCG_PROT_NONE for the target memcg, which would cause us to skip the
> > mem_cgroup_below_{min/low}() checks. After that commit we do not return
> > anything and we end up checking the min & low effective protections for
> > the target memcg, which are stale.
> >
> > Update mem_cgroup_supports_protection() to also check if we are
> > reclaiming from the target, and rename it to mem_cgroup_unprotected()
> > (now returns true if we should not protect the memcg, much simpler logic).
> >
> > Fixes: 45c7f7e1ef17 ("mm, memcg: decouple e{low,min} state mutations from protection checks")
> > Signed-off-by: Yosry Ahmed <yosryahmed@google.com>
>
> Reviewed-by: Roman Gushchin <roman.gushchin@linux.dev>
>
> Thank you!

Thanks for reviewing!

Do you think we need a CC to stable here?


  reply	other threads:[~2022-11-24  0:58 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-11-23  9:21 [PATCH v2 0/3] mm: memcg: fix " Yosry Ahmed
2022-11-23  9:21 ` [PATCH v2 1/3] mm: memcg: fix stale " Yosry Ahmed
2022-11-24  0:40   ` Roman Gushchin
2022-11-24  0:57     ` Yosry Ahmed [this message]
2022-11-23  9:21 ` [PATCH v2 2/3] selftests: cgroup: refactor proactive reclaim code to reclaim_until() Yosry Ahmed
2022-11-24  1:03   ` Roman Gushchin
2022-11-24  3:16     ` Yosry Ahmed
2022-11-29 19:42       ` Yosry Ahmed
2022-11-30 17:19         ` Roman Gushchin
2022-11-30 18:25           ` Yosry Ahmed
2022-12-02  3:19             ` Yosry Ahmed
2022-11-23  9:21 ` [PATCH v2 3/3] selftests: cgroup: make sure reclaim target memcg is unprotected Yosry Ahmed
2022-11-24  1:04   ` Roman Gushchin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAJD7tkY4QtVTJe5cxSKzKj0gOROD4a+o=Rt-wfvG1gcxSQC8Pg@mail.gmail.com' \
    --to=yosryahmed@google.com \
    --cc=chris@chrisdown.name \
    --cc=hannes@cmpxchg.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.com \
    --cc=roman.gushchin@linux.dev \
    --cc=shakeelb@google.com \
    --cc=songmuchun@bytedance.com \
    --cc=vasily.averin@linux.dev \
    --cc=vbabka@suse.cz \
    --cc=willy@infradead.org \
    --cc=yuzhao@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox