linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: Yafang Shao <laoar.shao@gmail.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>,
	Michal Hocko <mhocko@kernel.org>,
	Vladimir Davydov <vdavydov.dev@gmail.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Al Viro <viro@zeniv.linux.org.uk>, Linux MM <linux-mm@kvack.org>,
	linux-fsdevel@vger.kernel.org, Dave Chinner <dchinner@redhat.com>
Subject: Re: [PATCH v2 4/5] mm: make memcg visible to lru walker isolation function
Date: Tue, 7 Jan 2020 08:31:03 +1100	[thread overview]
Message-ID: <20200106213103.GJ23195@dread.disaster.area> (raw)
In-Reply-To: <CALOAHbD31GmGz17kNCOvw2kDvZE43=eAVT=1ww_+AF2T-R6A2w@mail.gmail.com>

On Mon, Jan 06, 2020 at 10:41:22PM +0800, Yafang Shao wrote:
> On Mon, Jan 6, 2020 at 8:17 AM Dave Chinner <david@fromorbit.com> wrote:
> > We can clean up the code a lot by getting rid of the unnecessary
> > indenting by doing this:
> >
> >         /* iterate the global lru first */
> >         isolated = list_lru_walk_one(lru, nid, NULL, isolate, cb_arg,
> >                                         nr_to_walk);
> >         if (!list_lru_memcg_aware(lru))
> >                 return isolated;
> >
> >         nlru = &lru->node[nid];
> >         for_each_mem_cgroup(memcg) {
> >                 /* already scanned the root memcg above */
> >                 if (is_root_memcg(memcg))
> >                         continue;
> >                 if (*nr_to_walk <= 0)
> >                         break;
> >                 spin_lock(&nlru->lock);
> >                 isolated += __list_lru_walk_one(nlru, memcg,
> >                                                 isolate, cb_arg,
> >                                                 nr_to_walk);
> >                 spin_unlock(&nlru->lock);
> >         }
> >         return isolated;
> >
> 
> That's eaiser to understand.
> I wil change it like this in next version.

Thanks!

> >
> > And so to architecture... This all makes me wonder why we still
> > special case the memcg LRU lists here.
> 
> Can't agree more.
> The first time I read this code, I wondered why not assign an
> non-negtive number to kmemcg_id of the root_mem_cgroup and then use
> memcg_lrus as well.

Yeah, I've been wondering why the ID is -1 instead of 0 when we
have a global variable that stores the root memcg that we can
compare pointers directly against via is_root_memcg(). all it seems
to do is make things more complex by having to special case the root
memcg....

From that perspective, I do like your change to use the memcg
iterator functions rather than a for loop over the range of indexes,
but I'd much prefer to see this method used consistently everywhere
rather than the way we've duplicated lots of code by tacking memcgs
onto the side of the non-memcg code paths.

> > Ever since we went to
> > per-node memcg lru lists (~2015, iirc), there's been this special
> > hidden hack for the root memcg to use the global list. and one that
> > I have to read lots of code to remind myself it exists every time I
> > have to did into this code again.
> >
> > I mean, if the list is not memcg aware, then it only needs a single
> > list per node - the root memcg list. That could be easily stored in
> > the memcg_lrus array for the node rather than a separate global
> > list, and then the node iteration code all starts to look like this:
> >
> >         nlru = &lru->node[nid];
> >         for_each_mem_cgroup(memcg) {
> >                 spin_lock(&nlru->lock);
> >                 isolated += __list_lru_walk_one(nlru, memcg,
> >                                                 isolate, cb_arg,
> >                                                 nr_to_walk);
> >                 spin_unlock(&nlru->lock);
> >                 if (*nr_to_walk <= 0)
> >                         break;
> >
> >                 /* non-memcg aware lists only have a root memcg list */
> >                 if (!list_lru_memcg_aware(lru))
> >                         break;
> >         }
> >
> > Hence for the walks in the !list_lru_memcg_aware(lru) case, the
> > list_lru_from_memcg() call in __list_lru_walk_one() always returns
> > just the root list. This makes everything use the same iteration
> > interface, and when you configure out memcg then we simply code the
> > the iterator to run once and list_lru_from_memcg() always returns
> > the one list...
> >
> 
> Agree with you that it is a better architecture, and I also want to
> change it like this.
> That would be more clear and easier for maintiance.
> But it requires lots of code changes, should we address it in another
> separate patchset ?

Yes. I think this is a separate piece of work as it spans much more
than just the list-lru infrastructure.

> > And, FWIW, I noticed that the list_lru memcg code assumes we only
> > ever put objects from memcg associated slab pages in the list_lru.
> > It calls memcg_from_slab_page() which makes no attempt to verify the
> > page is actually a slab page. That's a landmine just waiting to get
> > boom - list-lru can store any type of object the user wants, not
> > just slab pages. e.g. the binder code stores pages in the list-lru,
> > not slab objects, and so the only reason it doesn't go boom is that
> > the lru-list is not configured to be memcg aware....
> >
> 
> Another new issue.
> I will try to dignose what hiddened in this landmine is, and after I
> understand it clearly I will submit a new patchset.

The problem is memcg_from_slab_page() uses page->slab_cache directly
to retreive the owner memcg without first checking the
PageSlab(page) flag. If it's not a slab page, we need to get the
memcg from page->memcg, not page->slab_cache->memcg_params.memcg.

see page_cgroup_ino() for how to safely get the owner memcg from a
random page of unknown type...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com


  reply	other threads:[~2020-01-06 21:31 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-12-24  7:53 [PATCH v2 0/5] protect page cache from freeing inode Yafang Shao
2019-12-24  7:53 ` [PATCH v2 1/5] mm, memcg: reduce size of struct mem_cgroup by using bit field Yafang Shao
2019-12-26 21:23   ` Roman Gushchin
2019-12-27  1:03     ` Yafang Shao
2019-12-24  7:53 ` [PATCH v2 2/5] mm, memcg: introduce MEMCG_PROT_SKIP for memcg zero usage case Yafang Shao
2019-12-26 21:36   ` Roman Gushchin
2019-12-27  1:09     ` Yafang Shao
2019-12-24  7:53 ` [PATCH v2 3/5] mm, memcg: reset memcg's memory.{min, low} for reclaiming itself Yafang Shao
2019-12-26 21:45   ` Roman Gushchin
2019-12-27  1:11     ` Yafang Shao
2019-12-24  7:53 ` [PATCH v2 4/5] mm: make memcg visible to lru walker isolation function Yafang Shao
2020-01-04  3:35   ` Dave Chinner
2020-01-04  7:26     ` Yafang Shao
2020-01-04 21:23       ` Dave Chinner
2020-01-05  1:43         ` Yafang Shao
2020-01-06  0:17           ` Dave Chinner
2020-01-06 14:41             ` Yafang Shao
2020-01-06 21:31               ` Dave Chinner [this message]
2020-01-07 13:22                 ` Yafang Shao
2019-12-24  7:53 ` [PATCH v2 5/5] memcg, inode: protect page cache from freeing inode Yafang Shao
2019-12-25 13:01   ` kbuild test robot
2019-12-25 13:18   ` kbuild test robot
2019-12-26  5:09     ` Yafang Shao
2020-01-04  3:55   ` Dave Chinner
2020-01-04  7:42     ` Yafang Shao

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200106213103.GJ23195@dread.disaster.area \
    --to=david@fromorbit.com \
    --cc=akpm@linux-foundation.org \
    --cc=dchinner@redhat.com \
    --cc=hannes@cmpxchg.org \
    --cc=laoar.shao@gmail.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=vdavydov.dev@gmail.com \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox