Re: [PATCH] Partially revert "mm/memcontrol.c: keep local VM counters in sync with the hierarchical ones"

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Roman Gushchin <guro@fb.com>
To: Yafang Shao <laoar.shao@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Linux MM <linux-mm@kvack.org>, Michal Hocko <mhocko@kernel.org>,
	Johannes Weiner <hannes@cmpxchg.org>,
	LKML <linux-kernel@vger.kernel.org>,
	Kernel Team <Kernel-team@fb.com>,
	"stable@vger.kernel.org" <stable@vger.kernel.org>
Subject: Re: [PATCH] Partially revert "mm/memcontrol.c: keep local VM counters in sync with the hierarchical ones"
Date: Sat, 17 Aug 2019 19:14:24 +0000	[thread overview]
Message-ID: <20190817191419.GA11125@castle> (raw)
In-Reply-To: <CALOAHbBsMNLN6jZn83zx6EWM_092s87zvDQ7p-MZpY+HStk-1Q@mail.gmail.com>

On Sat, Aug 17, 2019 at 11:33:57AM +0800, Yafang Shao wrote:
> On Sat, Aug 17, 2019 at 8:47 AM Roman Gushchin <guro@fb.com> wrote:
> >
> > Commit 766a4c19d880 ("mm/memcontrol.c: keep local VM counters in sync
> > with the hierarchical ones") effectively decreased the precision of
> > per-memcg vmstats_local and per-memcg-per-node lruvec percpu counters.
> >
> > That's good for displaying in memory.stat, but brings a serious regression
> > into the reclaim process.
> >
> > One issue I've discovered and debugged is the following:
> > lruvec_lru_size() can return 0 instead of the actual number of pages
> > in the lru list, preventing the kernel to reclaim last remaining
> > pages. Result is yet another dying memory cgroups flooding.
> > The opposite is also happening: scanning an empty lru list
> > is the waste of cpu time.
> >
> > Also, inactive_list_is_low() can return incorrect values, preventing
> > the active lru from being scanned and freed. It can fail both because
> > the size of active and inactive lists are inaccurate, and because
> > the number of workingset refaults isn't precise. In other words,
> > the result is pretty random.
> >
> > I'm not sure, if using the approximate number of slab pages in
> > count_shadow_number() is acceptable, but issues described above
> > are enough to partially revert the patch.
> >
> > Let's keep per-memcg vmstat_local batched (they are only used for
> > displaying stats to the userspace), but keep lruvec stats precise.
> > This change fixes the dead memcg flooding on my setup.
> >
> 
> That will make some misunderstanding if the local counters are not in
> sync with the hierarchical ones
> (someone may doubt whether there're something leaked.).

Sure, but the actual leakage is a much more serious issue.

> If we have to do it like this, I think we should better document this behavior.

Lru size calculations can be done using per-zone counters, which is
actually cheaper, because the number of zones is usually smaller than
the number of cpus. I'll send a corresponding patch on Monday.

Maybe other use cases can also be converted?

Thanks!

> 
> > Fixes: 766a4c19d880 ("mm/memcontrol.c: keep local VM counters in sync with the hierarchical ones")
> > Signed-off-by: Roman Gushchin <guro@fb.com>
> > Cc: Yafang Shao <laoar.shao@gmail.com>
> > Cc: Johannes Weiner <hannes@cmpxchg.org>
> > ---
> >  mm/memcontrol.c | 8 +++-----
> >  1 file changed, 3 insertions(+), 5 deletions(-)
> >
> > diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> > index 249187907339..3429340adb56 100644
> > --- a/mm/memcontrol.c
> > +++ b/mm/memcontrol.c
> > @@ -746,15 +746,13 @@ void __mod_lruvec_state(struct lruvec *lruvec, enum node_stat_item idx,
> >         /* Update memcg */
> >         __mod_memcg_state(memcg, idx, val);
> >
> > +       /* Update lruvec */
> > +       __this_cpu_add(pn->lruvec_stat_local->count[idx], val);
> > +
> >         x = val + __this_cpu_read(pn->lruvec_stat_cpu->count[idx]);
> >         if (unlikely(abs(x) > MEMCG_CHARGE_BATCH)) {
> >                 struct mem_cgroup_per_node *pi;
> >
> > -               /*
> > -                * Batch local counters to keep them in sync with
> > -                * the hierarchical ones.
> > -                */
> > -               __this_cpu_add(pn->lruvec_stat_local->count[idx], x);
> >                 for (pi = pn; pi; pi = parent_nodeinfo(pi, pgdat->node_id))
> >                         atomic_long_add(x, &pi->lruvec_stat[idx]);
> >                 x = 0;
> > --
> > 2.21.0
> >

next prev parent reply	other threads:[~2019-08-17 19:14 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-08-17  0:47 Roman Gushchin
2019-08-17  3:33 ` Yafang Shao
2019-08-17 19:14   ` Roman Gushchin [this message]
2019-08-18  0:30     ` Yafang Shao
2019-08-19 21:20       ` Roman Gushchin
2019-08-20  1:29         ` Yafang Shao
2019-08-17  6:36 ` Greg KH
2019-08-17 19:15   ` Roman Gushchin
2019-08-24 19:57     ` Andrew Morton
2019-08-24 20:23       ` Thomas Backlund
2019-08-27 14:10         ` Michal Hocko
2019-08-27 17:06           ` Greg KH
2019-08-27 17:39             ` Michal Hocko
2019-08-27 18:39               ` Greg KH
2019-08-23 22:33 ` Roman Gushchin
2019-08-24  3:41   ` Yafang Shao

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190817191419.GA11125@castle \
    --to=guro@fb.com \
    --cc=Kernel-team@fb.com \
    --cc=akpm@linux-foundation.org \
    --cc=hannes@cmpxchg.org \
    --cc=laoar.shao@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox