From: Dave Chinner <david@fromorbit.com>
To: Yafang Shao <laoar.shao@gmail.com>
Cc: hannes@cmpxchg.org, mhocko@kernel.org, vdavydov.dev@gmail.com,
akpm@linux-foundation.org, viro@zeniv.linux.org.uk,
linux-mm@kvack.org, linux-fsdevel@vger.kernel.org,
Roman Gushchin <guro@fb.com>, Chris Down <chris@chrisdown.name>,
Dave Chinner <dchinner@redhat.com>
Subject: Re: [PATCH 4/4] memcg, inode: protect page cache from freeing inode
Date: Wed, 18 Dec 2019 13:21:22 +1100 [thread overview]
Message-ID: <20191218022122.GT19213@dread.disaster.area> (raw)
In-Reply-To: <1576582159-5198-5-git-send-email-laoar.shao@gmail.com>
On Tue, Dec 17, 2019 at 06:29:19AM -0500, Yafang Shao wrote:
> On my server there're some running MEMCGs protected by memory.{min, low},
> but I found the usage of these MEMCGs abruptly became very small, which
> were far less than the protect limit. It confused me and finally I
> found that was because of inode stealing.
> Once an inode is freed, all its belonging page caches will be dropped as
> well, no matter how may page caches it has. So if we intend to protect the
> page caches in a memcg, we must protect their host (the inode) first.
> Otherwise the memcg protection can be easily bypassed with freeing inode,
> especially if there're big files in this memcg.
> The inherent mismatch between memcg and inode is a trouble. One inode can
> be shared by different MEMCGs, but it is a very rare case. If an inode is
> shared, its belonging page caches may be charged to different MEMCGs.
> Currently there's no perfect solution to fix this kind of issue, but the
> inode majority-writer ownership switching can help it more or less.
>
> Cc: Roman Gushchin <guro@fb.com>
> Cc: Chris Down <chris@chrisdown.name>
> Cc: Dave Chinner <dchinner@redhat.com>
> Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
> ---
> fs/inode.c | 9 +++++++++
> include/linux/memcontrol.h | 15 +++++++++++++++
> mm/memcontrol.c | 46 ++++++++++++++++++++++++++++++++++++++++++++++
> mm/vmscan.c | 4 ++++
> 4 files changed, 74 insertions(+)
>
> diff --git a/fs/inode.c b/fs/inode.c
> index fef457a..b022447 100644
> --- a/fs/inode.c
> +++ b/fs/inode.c
> @@ -734,6 +734,15 @@ static enum lru_status inode_lru_isolate(struct list_head *item,
> if (!spin_trylock(&inode->i_lock))
> return LRU_SKIP;
>
> +
> + /* Page protection only works in reclaimer */
> + if (inode->i_data.nrpages && current->reclaim_state) {
> + if (mem_cgroup_inode_protected(inode)) {
> + spin_unlock(&inode->i_lock);
> + return LRU_ROTATE;
Urk, so after having plumbed the memcg all the way down to the
list_lru walk code so that we only walk inodes in that memcg, we now
have to do a lookup from the inode back to the owner memcg to
determine if we should reclaim it? IOWs, I think the layering here
is all wrong - if memcg info is needed in the shrinker, it should
come from the shrink_control->memcg pointer, not be looked up from
the object being isolated...
i.e. this code should read something like this:
if (memcg && inode->i_data.nrpages &&
(!memcg_can_reclaim_inode(memcg, inode)) {
spin_unlock(&inode->i_lock);
return LRU_ROTATE;
}
This code does not need comments because it is obvious what it does,
and it provides a generic hook into inode reclaim for the memcg code
to decide whether the shrinker should reclaim the inode or not.
This is how the memcg code should interact with other shrinkers, too
(e.g. the dentry cache isolation function), so you need to look at
how to make the memcg visible to the lru walker isolation
functions....
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
next prev parent reply other threads:[~2019-12-18 2:21 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-12-17 11:29 [PATCH 0/4] " Yafang Shao
2019-12-17 11:29 ` [PATCH 1/4] mm, memcg: reduce size of struct mem_cgroup by using bit field Yafang Shao
2019-12-17 11:29 ` [PATCH 2/4] mm, memcg: introduce MEMCG_PROT_SKIP for memcg zero usage case Yafang Shao
2019-12-17 11:29 ` [PATCH 3/4] mm, memcg: reset memcg's memory.{min, low} for reclaiming itself Yafang Shao
2019-12-17 14:20 ` Chris Down
2019-12-18 1:13 ` Yafang Shao
2019-12-17 11:29 ` [PATCH 4/4] memcg, inode: protect page cache from freeing inode Yafang Shao
2019-12-18 2:21 ` Dave Chinner [this message]
2019-12-18 2:33 ` Yafang Shao
2019-12-18 17:53 ` Roman Gushchin
2019-12-19 1:45 ` Yafang Shao
2019-12-17 11:56 ` [PATCH 0/4] " Michal Hocko
2019-12-17 12:19 ` Yafang Shao
2019-12-17 16:54 ` Johannes Weiner
2019-12-18 1:17 ` Yafang Shao
2019-12-18 1:37 ` Andrew Morton
2019-12-18 1:51 ` Dave Chinner
2019-12-18 4:37 ` Johannes Weiner
2019-12-18 10:16 ` Dave Chinner
2019-12-18 21:38 ` Johannes Weiner
2019-12-19 2:04 ` Yafang Shao
2020-01-10 2:08 ` Dave Chinner
2019-12-18 17:27 ` Roman Gushchin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20191218022122.GT19213@dread.disaster.area \
--to=david@fromorbit.com \
--cc=akpm@linux-foundation.org \
--cc=chris@chrisdown.name \
--cc=dchinner@redhat.com \
--cc=guro@fb.com \
--cc=hannes@cmpxchg.org \
--cc=laoar.shao@gmail.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@kernel.org \
--cc=vdavydov.dev@gmail.com \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox