Re: [PATCH v5 0/6] fs/dcache: Track & limit # of negative dentries

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* Re: [PATCH v5 0/6] fs/dcache: Track & limit # of negative dentries
       [not found] ` <CA+55aFyH6dHw-7R3364dn32J4p7kxT=TqmnuozCn9_Bz-MHhxQ@mail.gmail.com>
@ 2018-07-02 21:18   ` Andrew Morton
  2018-07-02 22:21     ` Matthew Wilcox
                       ` (2 more replies)
  0 siblings, 3 replies; 18+ messages in thread
From: Andrew Morton @ 2018-07-02 21:18 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Waiman Long, Al Viro, Linux Kernel Mailing List, linux-fsdevel,
	Jan Kara, Paul McKenney, Ingo Molnar, Miklos Szeredi,
	Matthew Wilcox, Larry Woodman, James Bottomley, Wangkai (Kevin,C),
	linux-mm, Michal Hocko

On Mon, 2 Jul 2018 12:34:00 -0700 Linus Torvalds <torvalds@linux-foundation.org> wrote:

> On Sun, Jul 1, 2018 at 10:52 PM Waiman Long <longman@redhat.com> wrote:
> >
> > A rogue application can potentially create a large number of negative
> > dentries in the system consuming most of the memory available if it
> > is not under the direct control of a memory controller that enforce
> > kernel memory limit.
> 
> I certainly don't mind the patch series, but I would like it to be
> accompanied with some actual example numbers, just to make it all a
> bit more concrete.
> 
> Maybe even performance numbers showing "look, I've filled the dentry
> lists with nasty negative dentries, now it's all slower because we
> walk those less interesting entries".
> 

(Please cc linux-mm@kvack.org on this work)

Yup.  The description of the user-visible impact of current behavior is
far too vague.

In the [5/6] changelog it is mentioned that a large number of -ve
dentries can lead to oom-killings.  This sounds bad - -ve dentries
should be trivially reclaimable and we shouldn't be oom-killing in such
a situation.

Dumb question: do we know that negative dentries are actually
worthwhile?  Has anyone checked in the past couple of decades?  Perhaps
our lookups are so whizzy nowadays that we don't need them?

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v5 0/6] fs/dcache: Track & limit # of negative dentries
  2018-07-02 21:18   ` [PATCH v5 0/6] fs/dcache: Track & limit # of negative dentries Andrew Morton
@ 2018-07-02 22:21     ` Matthew Wilcox
  2018-07-02 22:31       ` Linus Torvalds
  2018-07-02 22:34     ` James Bottomley
  2018-07-03  1:11     ` Waiman Long
  2 siblings, 1 reply; 18+ messages in thread
From: Matthew Wilcox @ 2018-07-02 22:21 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Linus Torvalds, Waiman Long, Al Viro, Linux Kernel Mailing List,
	linux-fsdevel, Jan Kara, Paul McKenney, Ingo Molnar,
	Miklos Szeredi, Larry Woodman, James Bottomley, Wangkai (Kevin,C),
	linux-mm, Michal Hocko

On Mon, Jul 02, 2018 at 02:18:11PM -0700, Andrew Morton wrote:
> In the [5/6] changelog it is mentioned that a large number of -ve
> dentries can lead to oom-killings.  This sounds bad - -ve dentries
> should be trivially reclaimable and we shouldn't be oom-killing in such
> a situation.
> 
> Dumb question: do we know that negative dentries are actually
> worthwhile?  Has anyone checked in the past couple of decades?  Perhaps
> our lookups are so whizzy nowadays that we don't need them?

I can't believe that's true.  Have you looked at strace of a typical
program startup recently?

$ strace -o ls.out ls 
$ grep -c ENOENT ls.out 
10

There's a few duplicates in there (6 accesses to /etc/ld.so.nohwcap), so
we definitely want those negative entries.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v5 0/6] fs/dcache: Track & limit # of negative dentries
  2018-07-02 22:21     ` Matthew Wilcox
@ 2018-07-02 22:31       ` Linus Torvalds
  0 siblings, 0 replies; 18+ messages in thread
From: Linus Torvalds @ 2018-07-02 22:31 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Andrew Morton, Waiman Long, Al Viro, Linux Kernel Mailing List,
	linux-fsdevel, Jan Kara, Paul McKenney, Ingo Molnar,
	Miklos Szeredi, Larry Woodman, James Bottomley, Wangkai (Kevin,C),
	linux-mm, Michal Hocko

On Mon, Jul 2, 2018 at 3:21 PM Matthew Wilcox <willy@infradead.org> wrote:
>
> On Mon, Jul 02, 2018 at 02:18:11PM -0700, Andrew Morton wrote:
> >
> > Dumb question: do we know that negative dentries are actually
> > worthwhile?  Has anyone checked in the past couple of decades?  Perhaps
> > our lookups are so whizzy nowadays that we don't need them?
>
> I can't believe that's true.

Yeah, I'm with Matthew.

Negative dentries are absolutely *critical*. We have a shit-ton of
stuff that walks various PATH-like things, trying to open a file in
one directory after another.

They also happen to be really fundamental to how the dentry cache
itself works, with operations like "rename()" fundamentally depending
on negative dentries.

Sure, that "fundamental to rename()" could still be something that
isn't actually ever *cached*, but the thing about many filesystems is
that it's actually much more expensive to look up a file that doesn't
exist than it is to look up one that does.

That can be true even with things like hashed lookups, although it's
more obviously true with legacy filesystems.

Calling down to the filesystem every time you wonder "do I have a
/usr/local/bin/cat" binary would be absolutely horrid.

Also, honestly, I think the oom-killing thing says more about the
issues we've had with the memory freeing code than about negative
dentries. But the one problem with negative dentries is that it's
fairly easy to create a shit-ton of them, so while we absolutely don't
want to get rid of the concept, I do agree that having a limiter is a
fine fine idea.

                Linus

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v5 0/6] fs/dcache: Track & limit # of negative dentries
  2018-07-02 21:18   ` [PATCH v5 0/6] fs/dcache: Track & limit # of negative dentries Andrew Morton
  2018-07-02 22:21     ` Matthew Wilcox
@ 2018-07-02 22:34     ` James Bottomley
  2018-07-02 22:54       ` Linus Torvalds
  2018-07-02 23:19       ` Andrew Morton
  2018-07-03  1:11     ` Waiman Long
  2 siblings, 2 replies; 18+ messages in thread
From: James Bottomley @ 2018-07-02 22:34 UTC (permalink / raw)
  To: Andrew Morton, Linus Torvalds
  Cc: Waiman Long, Al Viro, Linux Kernel Mailing List, linux-fsdevel,
	Jan Kara, Paul McKenney, Ingo Molnar, Miklos Szeredi,
	Matthew Wilcox, Larry Woodman, Wangkai (Kevin,C),
	linux-mm, Michal Hocko

On Mon, 2018-07-02 at 14:18 -0700, Andrew Morton wrote:
> On Mon, 2 Jul 2018 12:34:00 -0700 Linus Torvalds <torvalds@linux-foun
> dation.org> wrote:
> 
> > On Sun, Jul 1, 2018 at 10:52 PM Waiman Long <longman@redhat.com>
> > wrote:
> > > 
> > > A rogue application can potentially create a large number of
> > > negative
> > > dentries in the system consuming most of the memory available if
> > > it
> > > is not under the direct control of a memory controller that
> > > enforce
> > > kernel memory limit.
> > 
> > I certainly don't mind the patch series, but I would like it to be
> > accompanied with some actual example numbers, just to make it all a
> > bit more concrete.
> > 
> > Maybe even performance numbers showing "look, I've filled the
> > dentry
> > lists with nasty negative dentries, now it's all slower because we
> > walk those less interesting entries".
> > 
> 
> (Please cc linux-mm@kvack.org on this work)
> 
> Yup.A A The description of the user-visible impact of current behavior
> is far too vague.
> 
> In the [5/6] changelog it is mentioned that a large number of -ve
> dentries can lead to oom-killings.A A This sounds bad - -ve dentries
> should be trivially reclaimable and we shouldn't be oom-killing in
> such a situation.

If you're old enough, it's dA(C)jA  vu; Andrea went on a negative dentry
rampage about 15 years ago:

https://lkml.org/lkml/2002/5/24/71

I think the summary of the thread is that it's not worth it because
dentries are a clean cache, so they're immediately shrinkable.

> Dumb question: do we know that negative dentries are actually
> worthwhile?A A Has anyone checked in the past couple of
> decades?A A Perhaps our lookups are so whizzy nowadays that we don't
> need them?

There are still a lot of applications that keep looking up non-existent 
files, so I think it's still beneficial to keep them.  Apparently
apache still looks for a .htaccess file in every directory it
traverses, for instance.  Round tripping every one of these to disk
instead of caching it as a negative dentry would seem to be a
performance loser here.

However, actually measuring this again might be useful.

James

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v5 0/6] fs/dcache: Track & limit # of negative dentries
  2018-07-02 22:34     ` James Bottomley
@ 2018-07-02 22:54       ` Linus Torvalds
  2018-07-02 23:03         ` Linus Torvalds
  2018-07-02 23:19       ` Andrew Morton
  1 sibling, 1 reply; 18+ messages in thread
From: Linus Torvalds @ 2018-07-02 22:54 UTC (permalink / raw)
  To: James Bottomley
  Cc: Andrew Morton, Waiman Long, Al Viro, Linux Kernel Mailing List,
	linux-fsdevel, Jan Kara, Paul McKenney, Ingo Molnar,
	Miklos Szeredi, Matthew Wilcox, Larry Woodman, Wangkai (Kevin,C),
	linux-mm, Michal Hocko

On Mon, Jul 2, 2018 at 3:34 PM James Bottomley
<James.Bottomley@hansenpartnership.com> wrote:
>
> There are still a lot of applications that keep looking up non-existent
> files, so I think it's still beneficial to keep them.  Apparently
> apache still looks for a .htaccess file in every directory it
> traverses, for instance.

.. or git looking for ".gitignore" files in every directory, or any
number of similar things.

Lookie here, for example:

  [torvalds@i7 linux]$ strace -e trace=%file -c git status
  On branch master
  Your branch is up to date with 'origin/master'.

  nothing to commit, working tree clean
  % time     seconds  usecs/call     calls    errors syscall
  ------ ----------- ----------- --------- --------- ----------------
   73.23    0.009066           2      4056         6 open
   23.33    0.002888           2      1294      1189 openat
    1.60    0.000198          13        15         8 access
    0.80    0.000099           2        36        31 lstat
    0.53    0.000066           1        40         6 stat
    0.27    0.000033           8         4           getcwd
    0.11    0.000014          14         1           execve
    0.11    0.000014          14         1           chdir
    0.02    0.000003           3         1         1 readlink
    0.00    0.000000           0         1           unlink
  ------ ----------- ----------- --------- --------- ----------------
  100.00    0.012381                  5449      1241 total

so almost a quarter (1241 of 5449) of the file accesses resulted in
errors (and I think they are all ENOENT).

Negative lookups are *not* some unusual thing.

                   Linus

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v5 0/6] fs/dcache: Track & limit # of negative dentries
  2018-07-02 22:54       ` Linus Torvalds
@ 2018-07-02 23:03         ` Linus Torvalds
  0 siblings, 0 replies; 18+ messages in thread
From: Linus Torvalds @ 2018-07-02 23:03 UTC (permalink / raw)
  To: James Bottomley
  Cc: Andrew Morton, Waiman Long, Al Viro, Linux Kernel Mailing List,
	linux-fsdevel, Jan Kara, Paul McKenney, Ingo Molnar,
	Miklos Szeredi, Matthew Wilcox, Larry Woodman, Wangkai (Kevin,C),
	linux-mm, Michal Hocko

On Mon, Jul 2, 2018 at 3:54 PM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> Lookie here, for example:
>
>   [torvalds@i7 linux]$ strace -e trace=%file -c git status

So in the name of honestly, that's slightly misleading.

"git" will happily thread the actual index file up-to-date testing.

And that's hidden in the above numbers (because I didn't use "-f" to
follow threads), and they are all successful (because git will go an
'lstat()' on every single entry in the index file, and the index file
obviously is all valid filenames).

So the numbers quoted are closer to the

        git ls-files -o --exclude-standard

command (which doesn't check the index state, it only checks "what
non-tracked files do I have that aren't the ones I explicitly
exclude").

            Linus

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v5 0/6] fs/dcache: Track & limit # of negative dentries
  2018-07-02 22:34     ` James Bottomley
  2018-07-02 22:54       ` Linus Torvalds
@ 2018-07-02 23:19       ` Andrew Morton
  2018-07-02 23:28         ` Linus Torvalds
                           ` (2 more replies)
  1 sibling, 3 replies; 18+ messages in thread
From: Andrew Morton @ 2018-07-02 23:19 UTC (permalink / raw)
  To: James Bottomley
  Cc: Linus Torvalds, Waiman Long, Al Viro, Linux Kernel Mailing List,
	linux-fsdevel, Jan Kara, Paul McKenney, Ingo Molnar,
	Miklos Szeredi, Matthew Wilcox, Larry Woodman, Wangkai (Kevin,C),
	linux-mm, Michal Hocko

On Mon, 02 Jul 2018 15:34:40 -0700 James Bottomley <James.Bottomley@HansenPartnership.com> wrote:

> On Mon, 2018-07-02 at 14:18 -0700, Andrew Morton wrote:
> > On Mon, 2 Jul 2018 12:34:00 -0700 Linus Torvalds <torvalds@linux-foun
> > dation.org> wrote:
> > 
> > > On Sun, Jul 1, 2018 at 10:52 PM Waiman Long <longman@redhat.com>
> > > wrote:
> > > > 
> > > > A rogue application can potentially create a large number of
> > > > negative
> > > > dentries in the system consuming most of the memory available if
> > > > it
> > > > is not under the direct control of a memory controller that
> > > > enforce
> > > > kernel memory limit.
> > > 
> > > I certainly don't mind the patch series, but I would like it to be
> > > accompanied with some actual example numbers, just to make it all a
> > > bit more concrete.
> > > 
> > > Maybe even performance numbers showing "look, I've filled the
> > > dentry
> > > lists with nasty negative dentries, now it's all slower because we
> > > walk those less interesting entries".
> > > 
> > 
> > (Please cc linux-mm@kvack.org on this work)
> > 
> > Yup.  The description of the user-visible impact of current behavior
> > is far too vague.
> > 
> > In the [5/6] changelog it is mentioned that a large number of -ve
> > dentries can lead to oom-killings.  This sounds bad - -ve dentries
> > should be trivially reclaimable and we shouldn't be oom-killing in
> > such a situation.
> 
> If you're old enough, it's déjà vu; Andrea went on a negative dentry
> rampage about 15 years ago:
> 
> https://lkml.org/lkml/2002/5/24/71

That's kinda funny.

> I think the summary of the thread is that it's not worth it because
> dentries are a clean cache, so they're immediately shrinkable.

Yes, "should be".  I could understand that the presence of huge
nunmbers of -ve dentries could result in undesirable reclaim of
pagecache, etc.  Triggering oom-killings is very bad, and presumably
has the same cause.

Before we go and add a large amount of code to do the shrinker's job
for it, we should get a full understanding of what's going wrong.  Is
it because the dentry_lru had a mixture of +ve and -ve dentries? 
Should we have a separate LRU for -ve dentries?  Are we appropriately
aging the various dentries?  etc.

It could be that tuning/fixing the current code will fix whatever
problems inspired this patchset.

> > Dumb question: do we know that negative dentries are actually
> > worthwhile?  Has anyone checked in the past couple of
> > decades?  Perhaps our lookups are so whizzy nowadays that we don't
> > need them?
> 
> There are still a lot of applications that keep looking up non-existent 
> files, so I think it's still beneficial to keep them.  Apparently
> apache still looks for a .htaccess file in every directory it
> traverses, for instance.  Round tripping every one of these to disk
> instead of caching it as a negative dentry would seem to be a
> performance loser here.
> 
> However, actually measuring this again might be useful.

Yup.  I don't know how hard it would be to disable the -ve dentries
(the rename thing makes it sounds harder than I expected) but having
real numbers to justify continuing presence might be a fun project for
someone.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v5 0/6] fs/dcache: Track & limit # of negative dentries
  2018-07-02 23:19       ` Andrew Morton
@ 2018-07-02 23:28         ` Linus Torvalds
  2018-07-03  1:38         ` Waiman Long
  2018-07-03  9:18         ` Jan Kara
  2 siblings, 0 replies; 18+ messages in thread
From: Linus Torvalds @ 2018-07-02 23:28 UTC (permalink / raw)
  To: Andrew Morton
  Cc: James Bottomley, Waiman Long, Al Viro, Linux Kernel Mailing List,
	linux-fsdevel, Jan Kara, Paul McKenney, Ingo Molnar,
	Miklos Szeredi, Matthew Wilcox, Larry Woodman, Wangkai (Kevin,C),
	linux-mm, Michal Hocko

On Mon, Jul 2, 2018 at 4:19 PM Andrew Morton <akpm@linux-foundation.org> wrote:
>
> Before we go and add a large amount of code to do the shrinker's job
> for it, we should get a full understanding of what's going wrong.  Is
> it because the dentry_lru had a mixture of +ve and -ve dentries?
> Should we have a separate LRU for -ve dentries?  Are we appropriately
> aging the various dentries?  etc.
>
> It could be that tuning/fixing the current code will fix whatever
> problems inspired this patchset.

So I do think that the shrinker is likely the culprit behind the oom
issues. I think it's likely worse when you try to do some kind of
containerization, and dentries are shared.

That said, I think there are likely good reasons to limit excessive
negative dentries even outside the oom issue. Even if we did a perfect
job at shrinking them and took no time at all doing so, the fact that
you can generate an effecitvely infinite amount of negative dentries
and then polluting the dentry hash chains with them _could_ be a
performance problem.

No sane application does that, and we handle the "obvious" cases
already: ie if you create a lot of files in a deep subdirectory and
then do "rm -rf dir", we *will* throw the negative dentries away as we
remove the directories they are in. So it is unlikely to be much of a
problem in practice. But at least in theory you can generate many
millions of negative dentries just to mess with the system, and slow
down good people.

Probably not even remotely to the point of a DoS attack, but certainly
to the point of "we're wasting time".

So I do think that restricting negative dentries is a fine concept.
They are useful, but that doesn't mean that it makes sense to fill
memory with them.
                 Linus

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v5 0/6] fs/dcache: Track & limit # of negative dentries
  2018-07-02 21:18   ` [PATCH v5 0/6] fs/dcache: Track & limit # of negative dentries Andrew Morton
  2018-07-02 22:21     ` Matthew Wilcox
  2018-07-02 22:34     ` James Bottomley
@ 2018-07-03  1:11     ` Waiman Long
  2018-07-03 13:48       ` Vlastimil Babka
  2 siblings, 1 reply; 18+ messages in thread
From: Waiman Long @ 2018-07-03  1:11 UTC (permalink / raw)
  To: Andrew Morton, Linus Torvalds
  Cc: Al Viro, Linux Kernel Mailing List, linux-fsdevel, Jan Kara,
	Paul McKenney, Ingo Molnar, Miklos Szeredi, Matthew Wilcox,
	Larry Woodman, James Bottomley, Wangkai (Kevin,C),
	linux-mm, Michal Hocko

On 07/03/2018 05:18 AM, Andrew Morton wrote:
> On Mon, 2 Jul 2018 12:34:00 -0700 Linus Torvalds <torvalds@linux-foundation.org> wrote:
>
>> On Sun, Jul 1, 2018 at 10:52 PM Waiman Long <longman@redhat.com> wrote:
>>> A rogue application can potentially create a large number of negative
>>> dentries in the system consuming most of the memory available if it
>>> is not under the direct control of a memory controller that enforce
>>> kernel memory limit.
>> I certainly don't mind the patch series, but I would like it to be
>> accompanied with some actual example numbers, just to make it all a
>> bit more concrete.
>>
>> Maybe even performance numbers showing "look, I've filled the dentry
>> lists with nasty negative dentries, now it's all slower because we
>> walk those less interesting entries".
>>
> (Please cc linux-mm@kvack.org on this work)
>
> Yup.  The description of the user-visible impact of current behavior is
> far too vague.
>
> In the [5/6] changelog it is mentioned that a large number of -ve
> dentries can lead to oom-killings.  This sounds bad - -ve dentries
> should be trivially reclaimable and we shouldn't be oom-killing in such
> a situation.

The OOM situation was observed in an older distro kernel. It may not be
the case with the upstream kernel. I will double check that.

Cheers,
Longman

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v5 0/6] fs/dcache: Track & limit # of negative dentries
  2018-07-02 23:19       ` Andrew Morton
  2018-07-02 23:28         ` Linus Torvalds
@ 2018-07-03  1:38         ` Waiman Long
  2018-07-03  9:18         ` Jan Kara
  2 siblings, 0 replies; 18+ messages in thread
From: Waiman Long @ 2018-07-03  1:38 UTC (permalink / raw)
  To: Andrew Morton, James Bottomley
  Cc: Linus Torvalds, Al Viro, Linux Kernel Mailing List,
	linux-fsdevel, Jan Kara, Paul McKenney, Ingo Molnar,
	Miklos Szeredi, Matthew Wilcox, Larry Woodman, Wangkai (Kevin,C),
	linux-mm, Michal Hocko

On 07/03/2018 07:19 AM, Andrew Morton wrote:
> On Mon, 02 Jul 2018 15:34:40 -0700 James Bottomley <James.Bottomley@HansenPartnership.com> wrote:
>
>> On Mon, 2018-07-02 at 14:18 -0700, Andrew Morton wrote:
>>> On Mon, 2 Jul 2018 12:34:00 -0700 Linus Torvalds <torvalds@linux-foun
>>> dation.org> wrote:
>>>
>>>> On Sun, Jul 1, 2018 at 10:52 PM Waiman Long <longman@redhat.com>
>>>> wrote:
>>>>> A rogue application can potentially create a large number of
>>>>> negative
>>>>> dentries in the system consuming most of the memory available if
>>>>> it
>>>>> is not under the direct control of a memory controller that
>>>>> enforce
>>>>> kernel memory limit.
>>>> I certainly don't mind the patch series, but I would like it to be
>>>> accompanied with some actual example numbers, just to make it all a
>>>> bit more concrete.
>>>>
>>>> Maybe even performance numbers showing "look, I've filled the
>>>> dentry
>>>> lists with nasty negative dentries, now it's all slower because we
>>>> walk those less interesting entries".
>>>>
>>> (Please cc linux-mm@kvack.org on this work)
>>>
>>> Yup.  The description of the user-visible impact of current behavior
>>> is far too vague.
>>>
>>> In the [5/6] changelog it is mentioned that a large number of -ve
>>> dentries can lead to oom-killings.  This sounds bad - -ve dentries
>>> should be trivially reclaimable and we shouldn't be oom-killing in
>>> such a situation.
>> If you're old enough, it's déjà vu; Andrea went on a negative dentry
>> rampage about 15 years ago:
>>
>> https://lkml.org/lkml/2002/5/24/71
> That's kinda funny.
>
>> I think the summary of the thread is that it's not worth it because
>> dentries are a clean cache, so they're immediately shrinkable.
> Yes, "should be".  I could understand that the presence of huge
> nunmbers of -ve dentries could result in undesirable reclaim of
> pagecache, etc.  Triggering oom-killings is very bad, and presumably
> has the same cause.
>
> Before we go and add a large amount of code to do the shrinker's job
> for it, we should get a full understanding of what's going wrong.  Is
> it because the dentry_lru had a mixture of +ve and -ve dentries? 
> Should we have a separate LRU for -ve dentries?  Are we appropriately
> aging the various dentries?  etc.

I have actually investigated having a separate LRU for negative
dentries. That will result in a far more invasive patch that will be
more disruptive.

Another change that was suggested by a colleague is to put a newly
created -ve dentry to the tail (LRU end) of the LRU and move it to the
head only if it is accessed a second time. That will put most of the
negative dentries at the tail that will be more easily trimmed away.

Cheers,
Longman

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v5 0/6] fs/dcache: Track & limit # of negative dentries
  2018-07-02 23:19       ` Andrew Morton
  2018-07-02 23:28         ` Linus Torvalds
  2018-07-03  1:38         ` Waiman Long
@ 2018-07-03  9:18         ` Jan Kara
  2018-07-14 17:35           ` Pavel Machek
  2 siblings, 1 reply; 18+ messages in thread
From: Jan Kara @ 2018-07-03  9:18 UTC (permalink / raw)
  To: Andrew Morton
  Cc: James Bottomley, Linus Torvalds, Waiman Long, Al Viro,
	Linux Kernel Mailing List, linux-fsdevel, Jan Kara,
	Paul McKenney, Ingo Molnar, Miklos Szeredi, Matthew Wilcox,
	Larry Woodman, Wangkai (Kevin,C),
	linux-mm, Michal Hocko

On Mon 02-07-18 16:19:25, Andrew Morton wrote:
> On Mon, 02 Jul 2018 15:34:40 -0700 James Bottomley <James.Bottomley@HansenPartnership.com> wrote:
> 
> > On Mon, 2018-07-02 at 14:18 -0700, Andrew Morton wrote:
> > > On Mon, 2 Jul 2018 12:34:00 -0700 Linus Torvalds <torvalds@linux-foun
> > > dation.org> wrote:
> > > 
> > > > On Sun, Jul 1, 2018 at 10:52 PM Waiman Long <longman@redhat.com>
> > > > wrote:
> > > > > 
> > > > > A rogue application can potentially create a large number of
> > > > > negative
> > > > > dentries in the system consuming most of the memory available if
> > > > > it
> > > > > is not under the direct control of a memory controller that
> > > > > enforce
> > > > > kernel memory limit.
> > > > 
> > > > I certainly don't mind the patch series, but I would like it to be
> > > > accompanied with some actual example numbers, just to make it all a
> > > > bit more concrete.
> > > > 
> > > > Maybe even performance numbers showing "look, I've filled the
> > > > dentry
> > > > lists with nasty negative dentries, now it's all slower because we
> > > > walk those less interesting entries".
> > > > 
> > > 
> > > (Please cc linux-mm@kvack.org on this work)
> > > 
> > > Yup.  The description of the user-visible impact of current behavior
> > > is far too vague.
> > > 
> > > In the [5/6] changelog it is mentioned that a large number of -ve
> > > dentries can lead to oom-killings.  This sounds bad - -ve dentries
> > > should be trivially reclaimable and we shouldn't be oom-killing in
> > > such a situation.
> > 
> > If you're old enough, it's deja vu; Andrea went on a negative dentry
> > rampage about 15 years ago:
> > 
> > https://lkml.org/lkml/2002/5/24/71
> 
> That's kinda funny.
> 
> > I think the summary of the thread is that it's not worth it because
> > dentries are a clean cache, so they're immediately shrinkable.
> 
> Yes, "should be".  I could understand that the presence of huge
> nunmbers of -ve dentries could result in undesirable reclaim of
> pagecache, etc.  Triggering oom-killings is very bad, and presumably
> has the same cause.
> 
> Before we go and add a large amount of code to do the shrinker's job
> for it, we should get a full understanding of what's going wrong.  Is
> it because the dentry_lru had a mixture of +ve and -ve dentries? 
> Should we have a separate LRU for -ve dentries?  Are we appropriately
> aging the various dentries?  etc.
> 
> It could be that tuning/fixing the current code will fix whatever
> problems inspired this patchset.

What I think is contributing to the problems and could lead to reclaim
oddities is the internal fragmentation of dentry slab cache. Dentries are
relatively small, you get 21 per page on my system, so if trivial to
reclaim negative dentries get mixed with a small amount of unreclaimable
positive dentries, you can get a lot of pages in dentry slab cache that are
unreclaimable.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v5 0/6] fs/dcache: Track & limit # of negative dentries
  2018-07-03  1:11     ` Waiman Long
@ 2018-07-03 13:48       ` Vlastimil Babka
  0 siblings, 0 replies; 18+ messages in thread
From: Vlastimil Babka @ 2018-07-03 13:48 UTC (permalink / raw)
  To: Waiman Long, Andrew Morton, Linus Torvalds
  Cc: Al Viro, Linux Kernel Mailing List, linux-fsdevel, Jan Kara,
	Paul McKenney, Ingo Molnar, Miklos Szeredi, Matthew Wilcox,
	Larry Woodman, James Bottomley, Wangkai (Kevin,C),
	linux-mm, Michal Hocko

On 07/03/2018 03:11 AM, Waiman Long wrote:
> On 07/03/2018 05:18 AM, Andrew Morton wrote:
>> On Mon, 2 Jul 2018 12:34:00 -0700 Linus Torvalds <torvalds@linux-foundation.org> wrote:
>>
>>> On Sun, Jul 1, 2018 at 10:52 PM Waiman Long <longman@redhat.com> wrote:
>>>> A rogue application can potentially create a large number of negative
>>>> dentries in the system consuming most of the memory available if it
>>>> is not under the direct control of a memory controller that enforce
>>>> kernel memory limit.
>>> I certainly don't mind the patch series, but I would like it to be
>>> accompanied with some actual example numbers, just to make it all a
>>> bit more concrete.
>>>
>>> Maybe even performance numbers showing "look, I've filled the dentry
>>> lists with nasty negative dentries, now it's all slower because we
>>> walk those less interesting entries".
>>>
>> (Please cc linux-mm@kvack.org on this work)
>>
>> Yup.  The description of the user-visible impact of current behavior is
>> far too vague.
>>
>> In the [5/6] changelog it is mentioned that a large number of -ve
>> dentries can lead to oom-killings.  This sounds bad - -ve dentries
>> should be trivially reclaimable and we shouldn't be oom-killing in such
>> a situation.
> 
> The OOM situation was observed in an older distro kernel. It may not be
> the case with the upstream kernel. I will double check that.

Note that dentries with externally allocated (long) names might have
been the factor here, until recent commits f1782c9bc547 ("dcache:
account external names as indirectly reclaimable memory") and
d79f7aa496fc ("mm: treat indirectly reclaimable memory as free in
overcommit logic").

Vlastimil

> Cheers,
> Longman
> 

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v5 0/6] fs/dcache: Track & limit # of negative dentries
  2018-07-03  9:18         ` Jan Kara
@ 2018-07-14 17:35           ` Pavel Machek
  2018-07-14 18:00             ` Linus Torvalds
  0 siblings, 1 reply; 18+ messages in thread
From: Pavel Machek @ 2018-07-14 17:35 UTC (permalink / raw)
  To: Jan Kara
  Cc: Andrew Morton, James Bottomley, Linus Torvalds, Waiman Long,
	Al Viro, Linux Kernel Mailing List, linux-fsdevel, Paul McKenney,
	Ingo Molnar, Miklos Szeredi, Matthew Wilcox, Larry Woodman,
	Wangkai (Kevin,C),
	linux-mm, Michal Hocko

> > Yes, "should be".  I could understand that the presence of huge
> > nunmbers of -ve dentries could result in undesirable reclaim of
> > pagecache, etc.  Triggering oom-killings is very bad, and presumably
> > has the same cause.
> > 
> > Before we go and add a large amount of code to do the shrinker's job
> > for it, we should get a full understanding of what's going wrong.  Is
> > it because the dentry_lru had a mixture of +ve and -ve dentries? 
> > Should we have a separate LRU for -ve dentries?  Are we appropriately
> > aging the various dentries?  etc.
> > 
> > It could be that tuning/fixing the current code will fix whatever
> > problems inspired this patchset.
> 
> What I think is contributing to the problems and could lead to reclaim
> oddities is the internal fragmentation of dentry slab cache. Dentries are
> relatively small, you get 21 per page on my system, so if trivial to
> reclaim negative dentries get mixed with a small amount of unreclaimable
> positive dentries, you can get a lot of pages in dentry slab cache that are
> unreclaimable.

Could we allocate -ve entries from separate slab?

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v5 0/6] fs/dcache: Track & limit # of negative dentries
  2018-07-14 17:35           ` Pavel Machek
@ 2018-07-14 18:00             ` Linus Torvalds
  2018-07-14 18:34               ` Al Viro
  0 siblings, 1 reply; 18+ messages in thread
From: Linus Torvalds @ 2018-07-14 18:00 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Jan Kara, Andrew Morton, James Bottomley, Waiman Long, Al Viro,
	Linux Kernel Mailing List, linux-fsdevel, Paul McKenney,
	Ingo Molnar, Miklos Szeredi, Matthew Wilcox, Larry Woodman,
	Wangkai (Kevin,C),
	linux-mm, Michal Hocko

On Sat, Jul 14, 2018 at 10:35 AM Pavel Machek <pavel@ucw.cz> wrote:
>
> Could we allocate -ve entries from separate slab?

No, because negative dentrires don't stay negative.

Every single positive dentry starts out as a negative dentry that is
passed in to "lookup()" to maybe be made positive.

And most of the time they <i>do</i> turn positive, because most of the
time people actually open files that exist.

But then occasionally you don't, because you're just blindly opening a
filename whether it exists or not (to _check_ whether it's there).

              Linus

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v5 0/6] fs/dcache: Track & limit # of negative dentries
  2018-07-14 18:00             ` Linus Torvalds
@ 2018-07-14 18:34               ` Al Viro
  2018-07-14 18:36                 ` Al Viro
  2018-07-18 16:01                 ` Waiman Long
  0 siblings, 2 replies; 18+ messages in thread
From: Al Viro @ 2018-07-14 18:34 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Pavel Machek, Jan Kara, Andrew Morton, James Bottomley,
	Waiman Long, Linux Kernel Mailing List, linux-fsdevel,
	Paul McKenney, Ingo Molnar, Miklos Szeredi, Matthew Wilcox,
	Larry Woodman, Wangkai (Kevin,C),
	linux-mm, Michal Hocko

On Sat, Jul 14, 2018 at 11:00:32AM -0700, Linus Torvalds wrote:
> On Sat, Jul 14, 2018 at 10:35 AM Pavel Machek <pavel@ucw.cz> wrote:
> >
> > Could we allocate -ve entries from separate slab?
> 
> No, because negative dentrires don't stay negative.
> 
> Every single positive dentry starts out as a negative dentry that is
> passed in to "lookup()" to maybe be made positive.
> 
> And most of the time they <i>do</i> turn positive, because most of the
> time people actually open files that exist.
> 
> But then occasionally you don't, because you're just blindly opening a
> filename whether it exists or not (to _check_ whether it's there).

BTW, one point that might not be realized by everyone: negative dentries
are *not* the hard case.
mount -t tmpfs none /mnt
touch /mnt/a
for i in `seq 100000`; do ln /mnt/a /mnt/$i; done

and you've got 100000 *unevictable* dentries, with the time per iteration
being not all that high (especially if you just call link(2) in a loop).
They are all positive and all pinned.  And you've got only one inode
there and no persistently opened files, so rlimit and quota won't help
any.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v5 0/6] fs/dcache: Track & limit # of negative dentries
  2018-07-14 18:34               ` Al Viro
@ 2018-07-14 18:36                 ` Al Viro
  2018-07-14 18:43                   ` Linus Torvalds
  2018-07-18 16:01                 ` Waiman Long
  1 sibling, 1 reply; 18+ messages in thread
From: Al Viro @ 2018-07-14 18:36 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Pavel Machek, Jan Kara, Andrew Morton, James Bottomley,
	Waiman Long, Linux Kernel Mailing List, linux-fsdevel,
	Paul McKenney, Ingo Molnar, Miklos Szeredi, Matthew Wilcox,
	Larry Woodman, Wangkai (Kevin,C),
	linux-mm, Michal Hocko

On Sat, Jul 14, 2018 at 07:34:45PM +0100, Al Viro wrote:
> On Sat, Jul 14, 2018 at 11:00:32AM -0700, Linus Torvalds wrote:
> > On Sat, Jul 14, 2018 at 10:35 AM Pavel Machek <pavel@ucw.cz> wrote:
> > >
> > > Could we allocate -ve entries from separate slab?
> > 
> > No, because negative dentrires don't stay negative.
> > 
> > Every single positive dentry starts out as a negative dentry that is
> > passed in to "lookup()" to maybe be made positive.
> > 
> > And most of the time they <i>do</i> turn positive, because most of the
> > time people actually open files that exist.
> > 
> > But then occasionally you don't, because you're just blindly opening a
> > filename whether it exists or not (to _check_ whether it's there).
> 
> BTW, one point that might not be realized by everyone: negative dentries
> are *not* the hard case.
> mount -t tmpfs none /mnt
> touch /mnt/a
> for i in `seq 100000`; do ln /mnt/a /mnt/$i; done
> 
> and you've got 100000 *unevictable* dentries, with the time per iteration
> being not all that high (especially if you just call link(2) in a loop).
> They are all positive and all pinned.  And you've got only one inode
> there and no persistently opened files, so rlimit and quota won't help
> any.

OK, this
        /*   
         * No ordinary (disk based) filesystem counts links as inodes;
         * but each new link needs a new dentry, pinning lowmem, and
         * tmpfs dentries cannot be pruned until they are unlinked.
         */
        ret = shmem_reserve_inode(inode->i_sb);
        if (ret)
                goto out;
will probably help (on ramfs it won't, though).

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v5 0/6] fs/dcache: Track & limit # of negative dentries
  2018-07-14 18:36                 ` Al Viro
@ 2018-07-14 18:43                   ` Linus Torvalds
  0 siblings, 0 replies; 18+ messages in thread
From: Linus Torvalds @ 2018-07-14 18:43 UTC (permalink / raw)
  To: Al Viro
  Cc: Pavel Machek, Jan Kara, Andrew Morton, James Bottomley,
	Waiman Long, Linux Kernel Mailing List, linux-fsdevel,
	Paul McKenney, Ingo Molnar, Miklos Szeredi, Matthew Wilcox,
	Larry Woodman, Wangkai (Kevin,C),
	linux-mm, Michal Hocko

On Sat, Jul 14, 2018 at 11:36 AM Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> OK, this
>         /*
>          * No ordinary (disk based) filesystem counts links as inodes;
>          * but each new link needs a new dentry, pinning lowmem, and
>          * tmpfs dentries cannot be pruned until they are unlinked.
>          */
>         ret = shmem_reserve_inode(inode->i_sb);
>         if (ret)
>                 goto out;
> will probably help (on ramfs it won't, though).

Nobody who cares about memory use would use ramfs and then allow
random users on it.

I think you can exhaust memory more easily on ramfs by just writing a
huge file. Do we have any limits at all?

ramfs is fine for things like initramfs, but I think the comment says it all:

 * NOTE! This filesystem is probably most useful
 * not as a real filesystem, but as an example of
 * how virtual filesystems can be written.

and even that comment may have been more correct back in 2000 than it is today.

             Linus

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v5 0/6] fs/dcache: Track & limit # of negative dentries
  2018-07-14 18:34               ` Al Viro
  2018-07-14 18:36                 ` Al Viro
@ 2018-07-18 16:01                 ` Waiman Long
  1 sibling, 0 replies; 18+ messages in thread
From: Waiman Long @ 2018-07-18 16:01 UTC (permalink / raw)
  To: Al Viro, Linus Torvalds
  Cc: Pavel Machek, Jan Kara, Andrew Morton, James Bottomley,
	Linux Kernel Mailing List, linux-fsdevel, Paul McKenney,
	Ingo Molnar, Miklos Szeredi, Matthew Wilcox, Larry Woodman,
	Wangkai (Kevin,C),
	linux-mm, Michal Hocko

On 07/14/2018 02:34 PM, Al Viro wrote:
> On Sat, Jul 14, 2018 at 11:00:32AM -0700, Linus Torvalds wrote:
>> On Sat, Jul 14, 2018 at 10:35 AM Pavel Machek <pavel@ucw.cz> wrote:
>>> Could we allocate -ve entries from separate slab?
>> No, because negative dentrires don't stay negative.
>>
>> Every single positive dentry starts out as a negative dentry that is
>> passed in to "lookup()" to maybe be made positive.
>>
>> And most of the time they <i>do</i> turn positive, because most of the
>> time people actually open files that exist.
>>
>> But then occasionally you don't, because you're just blindly opening a
>> filename whether it exists or not (to _check_ whether it's there).
> BTW, one point that might not be realized by everyone: negative dentries
> are *not* the hard case.
> mount -t tmpfs none /mnt
> touch /mnt/a
> for i in `seq 100000`; do ln /mnt/a /mnt/$i; done
>
> and you've got 100000 *unevictable* dentries, with the time per iteration
> being not all that high (especially if you just call link(2) in a loop).
> They are all positive and all pinned.  And you've got only one inode
> there and no persistently opened files, so rlimit and quota won't help
> any.

Normally you need to be root or have privileges to mount a filesystem.
Right?

I am aware there is effort going on to allow non-privilege user mount in
container. That can open a can of worms if it is not done properly.

With privileges, there is a lot of ways one can screw up the system. So
I am not less concern about this particular issue.

Cheers,
Longman

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2018-07-18 16:01 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <1530510723-24814-1-git-send-email-longman@redhat.com>
     [not found] ` <CA+55aFyH6dHw-7R3364dn32J4p7kxT=TqmnuozCn9_Bz-MHhxQ@mail.gmail.com>
2018-07-02 21:18   ` [PATCH v5 0/6] fs/dcache: Track & limit # of negative dentries Andrew Morton
2018-07-02 22:21     ` Matthew Wilcox
2018-07-02 22:31       ` Linus Torvalds
2018-07-02 22:34     ` James Bottomley
2018-07-02 22:54       ` Linus Torvalds
2018-07-02 23:03         ` Linus Torvalds
2018-07-02 23:19       ` Andrew Morton
2018-07-02 23:28         ` Linus Torvalds
2018-07-03  1:38         ` Waiman Long
2018-07-03  9:18         ` Jan Kara
2018-07-14 17:35           ` Pavel Machek
2018-07-14 18:00             ` Linus Torvalds
2018-07-14 18:34               ` Al Viro
2018-07-14 18:36                 ` Al Viro
2018-07-14 18:43                   ` Linus Torvalds
2018-07-18 16:01                 ` Waiman Long
2018-07-03  1:11     ` Waiman Long
2018-07-03 13:48       ` Vlastimil Babka

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox