linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: James Bottomley <James.Bottomley@HansenPartnership.com>
To: Matthew Wilcox <willy@infradead.org>, Dave Chinner <david@fromorbit.com>
Cc: Roman Gushchin <roman.gushchin@linux.dev>,
	Stephen Brennan <stephen.s.brennan@oracle.com>,
	lsf-pc@lists.linux-foundation.org,
	 linux-fsdevel@vger.kernel.org, linux-mm@kvack.org,
	Gautham Ananthakrishna <gautham.ananthakrishna@oracle.com>,
	khlebnikov@yandex-team.ru
Subject: Re: [LSF/MM TOPIC] Better handling of negative dentries
Date: Tue, 22 Mar 2022 15:19:15 -0400	[thread overview]
Message-ID: <a8f6ea9ec9b8f4d9b48e97fe1236f80b62b76dc1.camel@HansenPartnership.com> (raw)
In-Reply-To: <YjnmcaHhE1F2oTcH@casper.infradead.org>

On Tue, 2022-03-22 at 15:08 +0000, Matthew Wilcox wrote:
> On Wed, Mar 16, 2022 at 01:52:23PM +1100, Dave Chinner wrote:
> > On Wed, Mar 16, 2022 at 10:07:19AM +0800, Gao Xiang wrote:
> > > On Tue, Mar 15, 2022 at 01:56:18PM -0700, Roman Gushchin wrote:
> > > > > On Mar 15, 2022, at 12:56 PM, Matthew Wilcox <
> > > > > willy@infradead.org> wrote:
> > > > > 
> > > > > The number of negative dentries is effectively constrained
> > > > > only by memory
> > > > > size.  Systems which do not experience significant memory
> > > > > pressure for
> > > > > an extended period can build up millions of negative dentries
> > > > > which
> > > > > clog the dcache.  That can have different symptoms, such as
> > > > > inotify
> > > > > taking a long time [1], high memory usage [2] and even just
> > > > > poor lookup
> > > > > performance [3].  We've also seen problems with cgroups being
> > > > > pinned
> > > > > by negative dentries, though I think we now reparent those
> > > > > dentries to
> > > > > their parent cgroup instead.
> > > > 
> > > > Yes, it should be fixed already.
> > > > 
> > > > > We don't have a really good solution yet, and maybe some
> > > > > focused
> > > > > brainstorming on the problem would lead to something that
> > > > > actually works.
> > > > 
> > > > I’d be happy to join this discussion. And in my opinion it’s
> > > > going beyond negative dentries: there are other types of
> > > > objects which tend to grow beyond any reasonable limits if
> > > > there is no memory pressure.
> > > 
> > > +1, we once had a similar issue as well, and agree that is not
> > > only
> > > limited to negative dentries but all too many LRU-ed dentries and
> > > inodes.
> > 
> > Yup, any discussion solely about managing buildup of negative
> > dentries doesn't acknowledge that it is just a symptom of larger
> > problems that need to be addressed.
> 
> Yes, but let's not make this _so_ broad a discussion that it becomes
> unsolvable.  Rather, let's look for a solution to this particular
> problem that can be adopted by other caches that share a similar
> problem.

Well, firstly what is the exact problem?  People maliciously looking up
nonexistent files and causing the negative dentries to balloon or
simply managing dentries for a well behaved workload (which is what
good examples?)

> For example, we might be seduced into saying "this is a slab problem"
> because all the instances we have here allocate from slab.  But slab
> doesn't have enough information to solve the problem.  Maybe the
> working set of the current workload really needs 6 million dentries
> to perform optimally.  Maybe it needs 600.  Slab can't know
> that.  Maybe slab can play a role here, but the only component which
> can know the appropriate size for a cache is the cache itself.

Could we do something depending on the age of the oldest dentry on the
lru list?  We don't currently keep a last accessed time, but we could.

> I think the logic needs to be in d_alloc().  Before it calls
> __d_alloc(), it should check ... something ... to see if it should
> try to shrink the LRU list.

We could also do it in kill_dentry() or retain_dentry() ... we're past
all the fast paths when they start running.  They also have a view on
the liveness of the dentry.  Chances are if you get a negative dentry
back in kill_dentry, it isn't going to be instantiated, so it's only
use is to cache -ENOENT and we could perform more agressive shrinking
heuristics.

>   The devil is in what that something
> should be.  I'm no expert on the dcache; do we just want to call
> prune_dcache_sb() for every 1/1000 time?  Rely on DCACHE_REFERENCED
> to make sure that we're not over-pruning the list?  If so, what do we
> set nr_to_scan to?  1000 so that we try to keep the dentry list the
> same size?  1500 so that it actually tries to shrink?

Fundamentally, a negative dentry that doesn't get promoted to a
positive one better be looked up pretty often for us to keep it in the
cache, I think?  So last accessed time seems to be a good indicator.

We also have to scan the whole list for negative dentries, perhaps they
might get on better with their own lru list (although that complicates
the instantiation case and perhaps we can afford to spend the time
walking backwards over the lru list in alloc/kill anyway, so there's no
need for a separate list).

> I don't feel like I know enough to go further here.  But it feels
> better than what we currently do -- calling all the shrinkers from
> deep in the page allocator.

Well, yes, but that's not a high bar.

James




  reply	other threads:[~2022-03-22 19:19 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-03-15 19:55 Matthew Wilcox
2022-03-15 20:56 ` Roman Gushchin
2022-03-16  2:07   ` Gao Xiang
2022-03-16  2:52     ` Dave Chinner
2022-03-16  3:08       ` Gao Xiang
2022-03-22 15:08       ` Matthew Wilcox
2022-03-22 19:19         ` James Bottomley [this message]
2022-03-22 20:17           ` Colin Walters
2022-03-22 20:27             ` James Bottomley
2022-03-22 20:37             ` Matthew Wilcox
2022-03-22 21:08               ` Stephen Brennan
2022-03-29 15:24                 ` James Bottomley
2022-03-31 19:27                   ` Stephen Brennan
2022-04-01 15:45                     ` James Bottomley
2022-03-22 22:21         ` Dave Chinner
2022-03-22 20:41   ` Matthew Wilcox
2022-03-22 21:19     ` Roman Gushchin
2022-03-22 22:29       ` Dave Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a8f6ea9ec9b8f4d9b48e97fe1236f80b62b76dc1.camel@HansenPartnership.com \
    --to=james.bottomley@hansenpartnership.com \
    --cc=david@fromorbit.com \
    --cc=gautham.ananthakrishna@oracle.com \
    --cc=khlebnikov@yandex-team.ru \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lsf-pc@lists.linux-foundation.org \
    --cc=roman.gushchin@linux.dev \
    --cc=stephen.s.brennan@oracle.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox