From: Johannes Weiner <hannes@cmpxchg.org>
To: Dave Chinner <david@fromorbit.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
Rik van Riel <riel@redhat.com>, Jan Kara <jack@suse.cz>,
Vlastimil Babka <vbabka@suse.cz>,
Peter Zijlstra <peterz@infradead.org>, Tejun Heo <tj@kernel.org>,
Andi Kleen <andi@firstfloor.org>,
Andrea Arcangeli <aarcange@redhat.com>,
Greg Thelen <gthelen@google.com>,
Christoph Hellwig <hch@infradead.org>,
Hugh Dickins <hughd@google.com>,
KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
Mel Gorman <mgorman@suse.de>, Minchan Kim <minchan.kim@gmail.com>,
Michel Lespinasse <walken@google.com>,
Seth Jennings <sjenning@linux.vnet.ibm.com>,
Roman Gushchin <klamm@yandex-team.ru>,
Ozgun Erdogan <ozgun@citusdata.com>,
Metin Doslu <metin@citusdata.com>,
linux-mm@kvack.org, linux-fsdevel@vger.kernel.org,
linux-kernel@vger.kernel.org
Subject: Re: [patch 9/9] mm: keep page cache radix tree nodes in check
Date: Tue, 26 Nov 2013 18:00:10 -0500 [thread overview]
Message-ID: <20131126230010.GJ22729@cmpxchg.org> (raw)
In-Reply-To: <20131126222937.GA10988@dastard>
On Wed, Nov 27, 2013 at 09:29:37AM +1100, Dave Chinner wrote:
> On Tue, Nov 26, 2013 at 04:27:25PM -0500, Johannes Weiner wrote:
> > On Tue, Nov 26, 2013 at 10:49:21AM +1100, Dave Chinner wrote:
> > > On Sun, Nov 24, 2013 at 06:38:28PM -0500, Johannes Weiner wrote:
> > > > Previously, page cache radix tree nodes were freed after reclaim
> > > > emptied out their page pointers. But now reclaim stores shadow
> > > > entries in their place, which are only reclaimed when the inodes
> > > > themselves are reclaimed. This is problematic for bigger files that
> > > > are still in use after they have a significant amount of their cache
> > > > reclaimed, without any of those pages actually refaulting. The shadow
> > > > entries will just sit there and waste memory. In the worst case, the
> > > > shadow entries will accumulate until the machine runs out of memory.
> ....
> > > ....
> > > > + radix_tree_replace_slot(slot, page);
> > > > + if (node) {
> > > > + node->count++;
> > > > + /* Installed page, can't be shadow-only anymore */
> > > > + if (!list_empty(&node->lru))
> > > > + list_lru_del(&workingset_shadow_nodes, &node->lru);
> > > > + }
> > > > + return 0;
> > >
> > > Hmmmmm - what's the overhead of direct management of LRU removal
> > > here? Most list_lru code uses lazy removal (i.e. via the shrinker)
> > > to avoid having to touch the LRU when adding new references to an
> > > object.....
> >
> > It's measurable in microbenchmarks, but not when any real IO is
> > involved. The difference was in the noise even on SSD drives.
>
> Well, it's not an SSD or two I'm worried about - it's devices that
> can do millions of IOPS where this is likely to be noticable...
>
> > The other list_lru users see items only once they become unused and
> > subsequent references are expected to be few and temporary, right?
>
> They go onto the list when the refcount falls to zero, but reuse can
> be frequent when being referenced repeatedly by a single user. That
> avoids every reuse from removing the object from the LRU then
> putting it back on the LRU for every reference cycle...
That's true, but it's less of a concern in the radix_tree_node case
because it takes a full inactive list cycle after a refault before the
node is put back on the LRU. Or a really unlikely placed partial node
truncation/invalidation (full truncation would just delete the whole
node anyway).
> > We expect pages to refault in spades on certain loads, at which point
> > we may have thousands of those nodes on the list that are no longer
> > reclaimable (10k nodes for about 2.5G of cache).
>
> Sure, look at the way the inode and dentry caches work - entire
> caches of millions of inodes and dentries often sit on the LRUs. A
> quick look at my workstations dentry cache shows:
>
> $ at /proc/sys/fs/dentry-state
> 180108 170596 45 0 0 0
>
> 180k allocated dentries, 170k sitting on the LRU...
Hm, and a significant amount of those 170k could rotate on the next
shrinker scan due to recent references or do you generally have
smaller spikes?
But as per above I think the case for lazily removing shadow nodes is
less convincing than for inodes and dentries.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2013-11-26 23:00 UTC|newest]
Thread overview: 52+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-11-24 23:38 [patch 0/9] mm: thrash detection-based file cache sizing v6 Johannes Weiner
2013-11-24 23:38 ` [patch 1/9] fs: cachefiles: use add_to_page_cache_lru() Johannes Weiner
2013-11-24 23:38 ` [patch 2/9] lib: radix-tree: radix_tree_delete_item() Johannes Weiner
2013-11-25 8:21 ` Minchan Kim
2013-11-24 23:38 ` [patch 3/9] mm: shmem: save one radix tree lookup when truncating swapped pages Johannes Weiner
2013-11-25 8:21 ` Minchan Kim
2013-11-24 23:38 ` [patch 4/9] mm: filemap: move radix tree hole searching here Johannes Weiner
2013-11-24 23:38 ` [patch 5/9] mm + fs: prepare for non-page entries in page cache radix trees Johannes Weiner
2013-11-24 23:38 ` [patch 6/9] mm + fs: store shadow entries in page cache Johannes Weiner
2013-11-25 23:17 ` Dave Chinner
2013-11-26 10:20 ` Peter Zijlstra
2013-11-27 16:45 ` Johannes Weiner
2013-11-27 17:08 ` Johannes Weiner
2013-11-27 23:32 ` Dave Chinner
2013-11-24 23:38 ` [patch 7/9] mm: thrash detection-based file cache sizing Johannes Weiner
2013-11-25 23:50 ` Andrew Morton
2013-11-26 2:15 ` Johannes Weiner
2013-11-26 1:56 ` Ryan Mallon
2013-11-26 20:57 ` Johannes Weiner
2013-11-24 23:38 ` [patch 8/9] lib: radix_tree: tree node interface Johannes Weiner
2013-11-24 23:38 ` [patch 9/9] mm: keep page cache radix tree nodes in check Johannes Weiner
2013-11-25 23:49 ` Dave Chinner
2013-11-26 21:27 ` Johannes Weiner
2013-11-26 22:29 ` Dave Chinner
2013-11-26 23:00 ` Johannes Weiner [this message]
2013-11-27 0:59 ` Dave Chinner
2013-11-26 0:13 ` Andrew Morton
2013-11-26 22:05 ` Johannes Weiner
2013-11-26 0:57 ` [patch 0/9] mm: thrash detection-based file cache sizing v6 Andrew Morton
2013-11-26 22:30 ` Johannes Weiner
2013-11-28 4:40 ` Johannes Weiner
2013-12-02 19:21 [patch 0/9] mm: thrash detection-based file cache sizing v7 Johannes Weiner
2013-12-02 19:21 ` [patch 9/9] mm: keep page cache radix tree nodes in check Johannes Weiner
2013-12-02 22:10 ` Dave Chinner
2013-12-02 22:46 ` Johannes Weiner
2014-01-10 18:10 [patch 0/9] mm: thrash detection-based file cache sizing v8 Johannes Weiner
2014-01-10 18:10 ` [patch 9/9] mm: keep page cache radix tree nodes in check Johannes Weiner
2014-01-10 23:09 ` Rik van Riel
2014-01-13 7:39 ` Minchan Kim
2014-01-14 5:40 ` Minchan Kim
2014-01-22 18:42 ` Johannes Weiner
2014-01-23 5:20 ` Minchan Kim
2014-01-23 19:22 ` Johannes Weiner
2014-01-27 2:31 ` Minchan Kim
2014-01-15 5:55 ` Bob Liu
2014-01-16 22:09 ` Johannes Weiner
2014-01-17 0:05 ` Dave Chinner
2014-01-20 23:17 ` Johannes Weiner
2014-01-21 3:03 ` Dave Chinner
2014-01-21 5:50 ` Johannes Weiner
2014-01-22 3:06 ` Dave Chinner
2014-01-22 6:57 ` Johannes Weiner
2014-01-22 18:48 ` Johannes Weiner
2014-01-23 5:57 ` Minchan Kim
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20131126230010.GJ22729@cmpxchg.org \
--to=hannes@cmpxchg.org \
--cc=aarcange@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=andi@firstfloor.org \
--cc=david@fromorbit.com \
--cc=gthelen@google.com \
--cc=hch@infradead.org \
--cc=hughd@google.com \
--cc=jack@suse.cz \
--cc=klamm@yandex-team.ru \
--cc=kosaki.motohiro@jp.fujitsu.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=metin@citusdata.com \
--cc=mgorman@suse.de \
--cc=minchan.kim@gmail.com \
--cc=ozgun@citusdata.com \
--cc=peterz@infradead.org \
--cc=riel@redhat.com \
--cc=sjenning@linux.vnet.ibm.com \
--cc=tj@kernel.org \
--cc=vbabka@suse.cz \
--cc=walken@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox