From: Yafang Shao <laoar.shao@gmail.com>
To: Dave Chinner <david@fromorbit.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>,
Michal Hocko <mhocko@kernel.org>,
Vladimir Davydov <vdavydov.dev@gmail.com>,
Andrew Morton <akpm@linux-foundation.org>,
Al Viro <viro@zeniv.linux.org.uk>, Linux MM <linux-mm@kvack.org>,
linux-fsdevel@vger.kernel.org, Roman Gushchin <guro@fb.com>,
Chris Down <chris@chrisdown.name>,
Dave Chinner <dchinner@redhat.com>
Subject: Re: [PATCH v2 5/5] memcg, inode: protect page cache from freeing inode
Date: Sat, 4 Jan 2020 15:42:21 +0800 [thread overview]
Message-ID: <CALOAHbBrSdeQjn0eR1bkhh=orGyk1O+NQa9Qt4vHTAsByG4PGA@mail.gmail.com> (raw)
In-Reply-To: <20200104035501.GE23195@dread.disaster.area>
On Sat, Jan 4, 2020 at 11:55 AM Dave Chinner <david@fromorbit.com> wrote:
>
> On Tue, Dec 24, 2019 at 02:53:26AM -0500, Yafang Shao wrote:
> > On my server there're some running MEMCGs protected by memory.{min, low},
> > but I found the usage of these MEMCGs abruptly became very small, which
> > were far less than the protect limit. It confused me and finally I
> > found that was because of inode stealing.
> > Once an inode is freed, all its belonging page caches will be dropped as
> > well, no matter how may page caches it has. So if we intend to protect the
> > page caches in a memcg, we must protect their host (the inode) first.
> > Otherwise the memcg protection can be easily bypassed with freeing inode,
> > especially if there're big files in this memcg.
> >
> > Supposes we have a memcg, and the stat of this memcg is,
> > memory.current = 1024M
> > memory.min = 512M
> > And in this memcg there's a inode with 800M page caches.
> > Once this memcg is scanned by kswapd or other regular reclaimers,
> > kswapd <<<< It can be either of the regular reclaimers.
> > shrink_node_memcgs
> > switch (mem_cgroup_protected()) <<<< Not protected
> > case MEMCG_PROT_NONE: <<<< Will scan this memcg
> > beak;
> > shrink_lruvec() <<<< Reclaim the page caches
> > shrink_slab() <<<< It may free this inode and drop all its
> > page caches(800M).
> > So we must protect the inode first if we want to protect page caches.
> >
> > The inherent mismatch between memcg and inode is a trouble. One inode can
> > be shared by different MEMCGs, but it is a very rare case. If an inode is
> > shared, its belonging page caches may be charged to different MEMCGs.
> > Currently there's no perfect solution to fix this kind of issue, but the
> > inode majority-writer ownership switching can help it more or less.
>
> There's multiple changes in this patch set. Yes, there's some inode
> cache futzing to deal with memcgs, but it also adds some weird
> undocumented "in low reclaim" heuristic that does something magical
> with "protection" that you don't describe or document at all. PLease
> separate that out into a new patch and document it clearly (both in
> the commit message and the code, please).
>
Sure, I will separate it and document it in next version.
> > diff --git a/fs/inode.c b/fs/inode.c
> > index fef457a..4f4b2f3 100644
> > --- a/fs/inode.c
> > +++ b/fs/inode.c
> > @@ -54,6 +54,13 @@
> > * inode_hash_lock
> > */
> >
> > +struct inode_head {
>
> not an "inode head" structure. They are inode isolation arguments.
> i.e. struct inode_isolation_args {...};
>
Agree with you that inode_isolation_args is better.
While I have a different idea, what about using inode_isolation_control ?
scan_control : to scan all MEMCGs
v
shrink_control: to shrink all slabs in one memcg
v
inode_isolation_control: to shrink one slab (the inode)
And in the future we may introduce dentry_isolation_control and some others.
Anyway that's a minor issue.
> > + struct list_head *freeable;
> > +#ifdef CONFIG_MEMCG_KMEM
> > + struct mem_cgroup *memcg;
> > +#endif
> > +};
>
> These defines are awful, too, and completely unnecesarily because
> the struct shrink_control unconditionally defines sc->memcg and
> so we can just code it throughout without caring whether memcgs are
> enabled or not.
>
Sure, will change it in next version.
> > +
> > static unsigned int i_hash_mask __read_mostly;
> > static unsigned int i_hash_shift __read_mostly;
> > static struct hlist_head *inode_hashtable __read_mostly;
> > @@ -724,8 +731,10 @@ int invalidate_inodes(struct super_block *sb, bool kill_dirty)
> > static enum lru_status inode_lru_isolate(struct list_head *item,
> > struct list_lru_one *lru, spinlock_t *lru_lock, void *arg)
> > {
> > - struct list_head *freeable = arg;
> > + struct inode_head *ihead = (struct inode_head *)arg;
>
> No need for a cast of a void *.
>
OK.
> > + struct list_head *freeable = ihead->freeable;
> > struct inode *inode = container_of(item, struct inode, i_lru);
> > + struct mem_cgroup *memcg = NULL;
>
> struct inode_isolation_args *iargs = arg;
> struct list_head *freeable = iargs->freeable;
> struct mem_cgroup *memcg = iargs->memcg;
> struct inode *inode = container_of(item, struct inode, i_lru);
>
Thanks. That looks better.
> > @@ -734,6 +743,15 @@ static enum lru_status inode_lru_isolate(struct list_head *item,
> > if (!spin_trylock(&inode->i_lock))
> > return LRU_SKIP;
> >
> > +#ifdef CONFIG_MEMCG_KMEM
> > + memcg = ihead->memcg;
> > +#endif
>
> This is no longer necessary.
>
Thanks.
> > + if (memcg && inode->i_data.nrpages &&
> > + !(memcg_can_reclaim_inode(memcg, inode))) {
> > + spin_unlock(&inode->i_lock);
> > + return LRU_ROTATE;
> > + }
>
> I'd argue that both the memcg and the inode->i_data.nrpages check
> should be inside memcg_can_reclaim_inode(), that way this entire
> chunk of code disappears when CONFIG_MEMCG_KMEM=N.
>
I will think about it and make the code better when CONFIG_MEMCG_KMEM=N.
> > diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
> > index f36ada9..d1d4175 100644
> > --- a/include/linux/memcontrol.h
> > +++ b/include/linux/memcontrol.h
> > @@ -247,6 +247,9 @@ struct mem_cgroup {
> > unsigned int tcpmem_active : 1;
> > unsigned int tcpmem_pressure : 1;
> >
> > + /* Soft protection will be ignored if it's true */
> > + unsigned int in_low_reclaim : 1;
>
> This is the stuff that has nothing to do with the changes to the
> inode reclaim shrinker...
>
In next version I will drop this in_low_reclaim and using the
memcg_low_reclaim passed from struct scan_control.
Thanks
Yafang
prev parent reply other threads:[~2020-01-04 7:43 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-12-24 7:53 [PATCH v2 0/5] " Yafang Shao
2019-12-24 7:53 ` [PATCH v2 1/5] mm, memcg: reduce size of struct mem_cgroup by using bit field Yafang Shao
2019-12-26 21:23 ` Roman Gushchin
2019-12-27 1:03 ` Yafang Shao
2019-12-24 7:53 ` [PATCH v2 2/5] mm, memcg: introduce MEMCG_PROT_SKIP for memcg zero usage case Yafang Shao
2019-12-26 21:36 ` Roman Gushchin
2019-12-27 1:09 ` Yafang Shao
2019-12-24 7:53 ` [PATCH v2 3/5] mm, memcg: reset memcg's memory.{min, low} for reclaiming itself Yafang Shao
2019-12-26 21:45 ` Roman Gushchin
2019-12-27 1:11 ` Yafang Shao
2019-12-24 7:53 ` [PATCH v2 4/5] mm: make memcg visible to lru walker isolation function Yafang Shao
2020-01-04 3:35 ` Dave Chinner
2020-01-04 7:26 ` Yafang Shao
2020-01-04 21:23 ` Dave Chinner
2020-01-05 1:43 ` Yafang Shao
2020-01-06 0:17 ` Dave Chinner
2020-01-06 14:41 ` Yafang Shao
2020-01-06 21:31 ` Dave Chinner
2020-01-07 13:22 ` Yafang Shao
2019-12-24 7:53 ` [PATCH v2 5/5] memcg, inode: protect page cache from freeing inode Yafang Shao
2019-12-25 13:01 ` kbuild test robot
2019-12-25 13:18 ` kbuild test robot
2019-12-26 5:09 ` Yafang Shao
2020-01-04 3:55 ` Dave Chinner
2020-01-04 7:42 ` Yafang Shao [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CALOAHbBrSdeQjn0eR1bkhh=orGyk1O+NQa9Qt4vHTAsByG4PGA@mail.gmail.com' \
--to=laoar.shao@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=chris@chrisdown.name \
--cc=david@fromorbit.com \
--cc=dchinner@redhat.com \
--cc=guro@fb.com \
--cc=hannes@cmpxchg.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@kernel.org \
--cc=vdavydov.dev@gmail.com \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox