linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Yafang Shao <laoar.shao@gmail.com>
To: Dave Chinner <david@fromorbit.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>,
	Michal Hocko <mhocko@kernel.org>,
	 Vladimir Davydov <vdavydov.dev@gmail.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	 Al Viro <viro@zeniv.linux.org.uk>, Linux MM <linux-mm@kvack.org>,
	 linux-fsdevel@vger.kernel.org,
	Dave Chinner <dchinner@redhat.com>
Subject: Re: [PATCH v2 4/5] mm: make memcg visible to lru walker isolation function
Date: Sun, 5 Jan 2020 09:43:11 +0800	[thread overview]
Message-ID: <CALOAHbBGRSfRTH7RYXfgAqtixuYvu=tRrr7zQyAvofrzktW=vA@mail.gmail.com> (raw)
In-Reply-To: <20200104212331.GG23195@dread.disaster.area>

On Sun, Jan 5, 2020 at 5:23 AM Dave Chinner <david@fromorbit.com> wrote:
>
> On Sat, Jan 04, 2020 at 03:26:13PM +0800, Yafang Shao wrote:
> > On Sat, Jan 4, 2020 at 11:36 AM Dave Chinner <david@fromorbit.com> wrote:
> > >
> > > On Tue, Dec 24, 2019 at 02:53:25AM -0500, Yafang Shao wrote:
> > > > The lru walker isolation function may use this memcg to do something, e.g.
> > > > the inode isolatation function will use the memcg to do inode protection in
> > > > followup patch. So make memcg visible to the lru walker isolation function.
> > > >
> > > > Something should be emphasized in this patch is it replaces
> > > > for_each_memcg_cache_index() with for_each_mem_cgroup() in
> > > > list_lru_walk_node(). Because there's a gap between these two MACROs that
> > > > for_each_mem_cgroup() depends on CONFIG_MEMCG while the other one depends
> > > > on CONFIG_MEMCG_KMEM. But as list_lru_memcg_aware() returns false if
> > > > CONFIG_MEMCG_KMEM is not configured, it is safe to this replacement.
> > > >
> > > > Cc: Dave Chinner <dchinner@redhat.com>
> > > > Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
> > >
> > > ....
> > >
> > > > @@ -299,17 +299,15 @@ unsigned long list_lru_walk_node(struct list_lru *lru, int nid,
> > > >                                list_lru_walk_cb isolate, void *cb_arg,
> > > >                                unsigned long *nr_to_walk)
> > > >  {
> > > > +     struct mem_cgroup *memcg;
> > > >       long isolated = 0;
> > > > -     int memcg_idx;
> > > >
> > > > -     isolated += list_lru_walk_one(lru, nid, NULL, isolate, cb_arg,
> > > > -                                   nr_to_walk);
> > > > -     if (*nr_to_walk > 0 && list_lru_memcg_aware(lru)) {
> > > > -             for_each_memcg_cache_index(memcg_idx) {
> > > > +     if (list_lru_memcg_aware(lru)) {
> > > > +             for_each_mem_cgroup(memcg) {
> > > >                       struct list_lru_node *nlru = &lru->node[nid];
> > > >
> > > >                       spin_lock(&nlru->lock);
> > > > -                     isolated += __list_lru_walk_one(nlru, memcg_idx,
> > > > +                     isolated += __list_lru_walk_one(nlru, memcg,
> > > >                                                       isolate, cb_arg,
> > > >                                                       nr_to_walk);
> > > >                       spin_unlock(&nlru->lock);
> > > > @@ -317,7 +315,11 @@ unsigned long list_lru_walk_node(struct list_lru *lru, int nid,
> > > >                       if (*nr_to_walk <= 0)
> > > >                               break;
> > > >               }
> > > > +     } else {
> > > > +             isolated += list_lru_walk_one(lru, nid, NULL, isolate, cb_arg,
> > > > +                                           nr_to_walk);
> > > >       }
> > > > +
> > >
> > > That's a change of behaviour. The old code always runs per-node
> > > reclaim, then if the LRU is memcg aware it also runs the memcg
> > > aware reclaim. The new code never runs global per-node reclaim
> > > if the list is memcg aware, so shrinkers that are initialised
> > > with the flags SHRINKER_NUMA_AWARE | SHRINKER_MEMCG_AWARE seem
> > > likely to have reclaim problems with mixed memcg/global memory
> > > pressure scenarios.
> > >
> > > e.g. if all the memory is in the per-node lists, and the memcg needs
> > > to reclaim memory because of a global shortage, it is now unable to
> > > reclaim global memory.....
> > >
> >
> > Hi Dave,
> >
> > Thanks for your detailed explanation.
> > But I have different understanding.
> > The difference between for_each_mem_cgroup(memcg) and
> > for_each_memcg_cache_index(memcg_idx) is that the
> > for_each_mem_cgroup() includes the root_mem_cgroup while the
> > for_each_memcg_cache_index() excludes the root_mem_cgroup because the
> > memcg_idx of it is -1.
>
> Except that the "root" memcg that for_each_mem_cgroup() is not the
> "global root" memcg - it is whatever memcg that is passed down in
> the shrink_control, whereever that sits in the cgroup tree heirarchy.
> do_shrink_slab() only ever passes down the global root memcg to the
> shrinkers when the global root memcg is passed to shrink_slab(), and
> that does not iterate the memcg heirarchy - it just wants to
> reclaim from global caches an non-memcg aware shrinkers.
>
> IOWs, there are multiple changes in behaviour here - memcg specific
> reclaim won't do global reclaim, and global reclaim will now iterate
> all memcgs instead of just the global root memcg.
>
> > So it can reclaim global memory even if the list is memcg aware.
> > Is that right ?
>
> If the memcg passed to this fucntion is the root memcg, then yes,
> it will behave as you suggest. But for the majority of memcg-context
> reclaim, the memcg is not the root memcg and so they will not do
> global reclaim anymore...
>

Thanks for you reply.
But I have to clairfy that this change is in list_lru_walk_node(), and
the memcg is not passed to this funtion from shrink_control.
In order to make it more clear, I paste the function here.

- The new function
unsigned long list_lru_walk_node(struct list_lru *lru, int nid,
                                 list_lru_walk_cb isolate, void *cb_arg,
                                 unsigned long *nr_to_walk)
{
        struct mem_cgroup *memcg;   <<<< A local variable
        long isolated = 0;

        if (list_lru_memcg_aware(lru)) {
                for_each_mem_cgroup(memcg) {  <<<<  scan all MEMCGs,
including root_mem_cgroup
                        struct list_lru_node *nlru = &lru->node[nid];

                        spin_lock(&nlru->lock);
                        isolated += __list_lru_walk_one(nlru, memcg,
                                                        isolate, cb_arg,
                                                        nr_to_walk);
                        spin_unlock(&nlru->lock);

                        if (*nr_to_walk <= 0)
                                break;
                }
        } else {
<<<<  scan global memory only (root_mem_cgroup)
                isolated += list_lru_walk_one(lru, nid, NULL, isolate, cb_arg,
                                              nr_to_walk);
        }

        return isolated;
}

- While the original function is,

unsigned long list_lru_walk_node(struct list_lru *lru, int nid,
                                 list_lru_walk_cb isolate, void *cb_arg,
                                 unsigned long *nr_to_walk)
{
        long isolated = 0;
        int memcg_idx;

        isolated += list_lru_walk_one(lru, nid, NULL, isolate, cb_arg,
                                      nr_to_walk);
            <<<<  scan global memory only (root_mem_cgroup)
        if (*nr_to_walk > 0 && list_lru_memcg_aware(lru)) {
                for_each_memcg_cache_index(memcg_idx) {   <<<< scan
all MEMCGs excludes root_mem_cgroup
                        struct list_lru_node *nlru = &lru->node[nid];

                        spin_lock(&nlru->lock);
                        isolated += __list_lru_walk_one(nlru, memcg_idx,
                                                        isolate, cb_arg,
                                                        nr_to_walk);
                        spin_unlock(&nlru->lock);

                        if (*nr_to_walk <= 0)
                                break;
                }
        }

        return isolated;
}

Is that right ?

Thanks
Yafang


  reply	other threads:[~2020-01-05  1:43 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-12-24  7:53 [PATCH v2 0/5] protect page cache from freeing inode Yafang Shao
2019-12-24  7:53 ` [PATCH v2 1/5] mm, memcg: reduce size of struct mem_cgroup by using bit field Yafang Shao
2019-12-26 21:23   ` Roman Gushchin
2019-12-27  1:03     ` Yafang Shao
2019-12-24  7:53 ` [PATCH v2 2/5] mm, memcg: introduce MEMCG_PROT_SKIP for memcg zero usage case Yafang Shao
2019-12-26 21:36   ` Roman Gushchin
2019-12-27  1:09     ` Yafang Shao
2019-12-24  7:53 ` [PATCH v2 3/5] mm, memcg: reset memcg's memory.{min, low} for reclaiming itself Yafang Shao
2019-12-26 21:45   ` Roman Gushchin
2019-12-27  1:11     ` Yafang Shao
2019-12-24  7:53 ` [PATCH v2 4/5] mm: make memcg visible to lru walker isolation function Yafang Shao
2020-01-04  3:35   ` Dave Chinner
2020-01-04  7:26     ` Yafang Shao
2020-01-04 21:23       ` Dave Chinner
2020-01-05  1:43         ` Yafang Shao [this message]
2020-01-06  0:17           ` Dave Chinner
2020-01-06 14:41             ` Yafang Shao
2020-01-06 21:31               ` Dave Chinner
2020-01-07 13:22                 ` Yafang Shao
2019-12-24  7:53 ` [PATCH v2 5/5] memcg, inode: protect page cache from freeing inode Yafang Shao
2019-12-25 13:01   ` kbuild test robot
2019-12-25 13:18   ` kbuild test robot
2019-12-26  5:09     ` Yafang Shao
2020-01-04  3:55   ` Dave Chinner
2020-01-04  7:42     ` Yafang Shao

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CALOAHbBGRSfRTH7RYXfgAqtixuYvu=tRrr7zQyAvofrzktW=vA@mail.gmail.com' \
    --to=laoar.shao@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=david@fromorbit.com \
    --cc=dchinner@redhat.com \
    --cc=hannes@cmpxchg.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=vdavydov.dev@gmail.com \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox