From: Dave Chinner <david@fromorbit.com>
To: Johannes Weiner <hannes@cmpxchg.org>
Cc: Yafang Shao <laoar.shao@gmail.com>,
Michal Hocko <mhocko@kernel.org>,
Vladimir Davydov <vdavydov.dev@gmail.com>,
Andrew Morton <akpm@linux-foundation.org>,
Al Viro <viro@zeniv.linux.org.uk>, Linux MM <linux-mm@kvack.org>,
linux-fsdevel@vger.kernel.org
Subject: Re: [PATCH 0/4] memcg, inode: protect page cache from freeing inode
Date: Wed, 18 Dec 2019 12:51:24 +1100 [thread overview]
Message-ID: <20191218015124.GS19213@dread.disaster.area> (raw)
In-Reply-To: <20191217165422.GA213613@cmpxchg.org>
On Tue, Dec 17, 2019 at 11:54:22AM -0500, Johannes Weiner wrote:
> CCing Dave
>
> On Tue, Dec 17, 2019 at 08:19:08PM +0800, Yafang Shao wrote:
> > On Tue, Dec 17, 2019 at 7:56 PM Michal Hocko <mhocko@kernel.org> wrote:
> > > What do you mean by this exactly. Are those inodes reclaimed by the
> > > regular memory reclaim or by other means? Because shrink_node does
> > > exclude shrinking slab for protected memcgs.
> >
> > By the regular memory reclaim, kswapd, direct reclaimer or memcg reclaimer.
> > IOW, the current->reclaim_state it set.
> >
> > Take an example for you.
> >
> > kswapd
> > balance_pgdat
> > shrink_node_memcgs
> > switch (mem_cgroup_protected) <<<< memory.current= 1024M
> > memory.min = 512M a file has 800M page caches
> > case MEMCG_PROT_NONE: <<<< hard limit is not reached.
> > beak;
> > shrink_lruvec
> > shrink_slab <<< it may free the inode and the free all its
> > page caches (800M)
<looks at patch>
Oh, great, yet another special heuristic reclaim hack for some
whacky memcg reclaim corner case.
> This problem exists independent of cgroup protection.
>
> The inode shrinker may take down an inode that's still holding a ton
> of (potentially active) page cache pages when the inode hasn't been
> referenced recently.
Ok, please explain to me how are those pages getting repeated
referenced and kept active without referencing the inode in some
way?
e.g. active mmap pins a struct file which pins the inode.
e.g. open fd pins a struct file which pins the inode.
e.g. open/read/write/close keeps a dentry active in cache which pins
the inode when not actively referenced by the open fd.
AFAIA, all of the cases where -file pages- are being actively
referenced require also actively referencing the inode in some way.
So why is the inode being reclaimed as an unreferenced inode at the
end of the LRU if these are actively referenced file pages?
> IMO we shouldn't be dropping data that the VM still considers hot
> compared to other data, just because the inode object hasn't been used
> as recently as other inode objects (e.g. drowned in a stream of
> one-off inode accesses).
It should not be drowned by one-off inode accesses because if
the file data is being actively referenced then there should be
frequent active references to the inode that contains the data and
that should be keeping it away from the tail of the inode LRU.
If the inode is not being frequently referenced, then it
isn't really part of the current working set of inodes, is it?
> I've carried the below patch in my private tree for testing cache
> aging decisions that the shrinker interfered with. (It would be nicer
> if page cache pages could pin the inode of course, but reclaim cannot
> easily participate in the inode refcounting scheme.)
>
> Thoughts?
>
> diff --git a/fs/inode.c b/fs/inode.c
> index fef457a42882..bfcaaaf6314f 100644
> --- a/fs/inode.c
> +++ b/fs/inode.c
> @@ -753,7 +753,13 @@ static enum lru_status inode_lru_isolate(struct list_head *item,
> return LRU_ROTATE;
> }
>
> - if (inode_has_buffers(inode) || inode->i_data.nrpages) {
> + /* Leave the pages to page reclaim */
> + if (inode->i_data.nrpages) {
> + spin_unlock(&inode->i_lock);
> + return LRU_ROTATE;
> + }
<sigh>
Remember this?
commit 69056ee6a8a3d576ed31e38b3b14c70d6c74edcc
Author: Dave Chinner <dchinner@redhat.com>
Date: Tue Feb 12 15:35:51 2019 -0800
Revert "mm: don't reclaim inodes with many attached pages"
This reverts commit a76cf1a474d7d ("mm: don't reclaim inodes with many
attached pages").
This change causes serious changes to page cache and inode cache
behaviour and balance, resulting in major performance regressions when
combining worklaods such as large file copies and kernel compiles.
https://bugzilla.kernel.org/show_bug.cgi?id=202441
This change is a hack to work around the problems introduced by changing
how agressive shrinkers are on small caches in commit 172b06c32b94 ("mm:
slowly shrink slabs with a relatively small number of objects"). It
creates more problems than it solves, wasn't adequately reviewed or
tested, so it needs to be reverted.
Link: http://lkml.kernel.org/r/20190130041707.27750-2-david@fromorbit.com
Fixes: a76cf1a474d7d ("mm: don't reclaim inodes with many attached pages")
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Cc: Wolfgang Walter <linux@stwm.de>
Cc: Roman Gushchin <guro@fb.com>
Cc: Spock <dairinin@gmail.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
diff --git a/fs/inode.c b/fs/inode.c
index 0cd47fe0dbe5..73432e64f874 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -730,11 +730,8 @@ static enum lru_status inode_lru_isolate(struct list_head *item,
return LRU_REMOVED;
}
- /*
- * Recently referenced inodes and inodes with many attached pages
- * get one more pass.
- */
- if (inode->i_state & I_REFERENCED || inode->i_data.nrpages > 1) {
+ /* recently referenced inodes get one more pass */
+ if (inode->i_state & I_REFERENCED) {
inode->i_state &= ~I_REFERENCED;
spin_unlock(&inode->i_lock);
return LRU_ROTATE;
--
Dave Chinner
david@fromorbit.com
next prev parent reply other threads:[~2019-12-18 1:51 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-12-17 11:29 Yafang Shao
2019-12-17 11:29 ` [PATCH 1/4] mm, memcg: reduce size of struct mem_cgroup by using bit field Yafang Shao
2019-12-17 11:29 ` [PATCH 2/4] mm, memcg: introduce MEMCG_PROT_SKIP for memcg zero usage case Yafang Shao
2019-12-17 11:29 ` [PATCH 3/4] mm, memcg: reset memcg's memory.{min, low} for reclaiming itself Yafang Shao
2019-12-17 14:20 ` Chris Down
2019-12-18 1:13 ` Yafang Shao
2019-12-17 11:29 ` [PATCH 4/4] memcg, inode: protect page cache from freeing inode Yafang Shao
2019-12-18 2:21 ` Dave Chinner
2019-12-18 2:33 ` Yafang Shao
2019-12-18 17:53 ` Roman Gushchin
2019-12-19 1:45 ` Yafang Shao
2019-12-17 11:56 ` [PATCH 0/4] " Michal Hocko
2019-12-17 12:19 ` Yafang Shao
2019-12-17 16:54 ` Johannes Weiner
2019-12-18 1:17 ` Yafang Shao
2019-12-18 1:37 ` Andrew Morton
2019-12-18 1:51 ` Dave Chinner [this message]
2019-12-18 4:37 ` Johannes Weiner
2019-12-18 10:16 ` Dave Chinner
2019-12-18 21:38 ` Johannes Weiner
2019-12-19 2:04 ` Yafang Shao
2020-01-10 2:08 ` Dave Chinner
2019-12-18 17:27 ` Roman Gushchin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20191218015124.GS19213@dread.disaster.area \
--to=david@fromorbit.com \
--cc=akpm@linux-foundation.org \
--cc=hannes@cmpxchg.org \
--cc=laoar.shao@gmail.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@kernel.org \
--cc=vdavydov.dev@gmail.com \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox