linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Amir Goldstein <amir73il@gmail.com>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Josef Bacik <josef@toxicpanda.com>,
	kernel-team@fb.com, linux-fsdevel@vger.kernel.org,  jack@suse.cz,
	brauner@kernel.org, linux-xfs@vger.kernel.org,
	 linux-btrfs@vger.kernel.org, linux-mm@kvack.org,
	linux-ext4@vger.kernel.org,
	 kernel test robot <oliver.sang@intel.com>
Subject: Re: [PATCH v6 06/17] fsnotify: generate pre-content permission event on open
Date: Tue, 12 Nov 2024 00:36:12 +0100	[thread overview]
Message-ID: <CAOQ4uxg0k4bGz6zOKS+Qt5BjEqDdUhvgG+5pLBPqSCcnQdffig@mail.gmail.com> (raw)
In-Reply-To: <CAHk-=wh4BEjbfaO93hiZs3YXoNmV=YkWT4=OOhuxM3vD2S-1iA@mail.gmail.com>

On Mon, Nov 11, 2024 at 10:52 PM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> On Mon, 11 Nov 2024 at 12:19, Josef Bacik <josef@toxicpanda.com> wrote:
> >
> > --- a/fs/namei.c
> > +++ b/fs/namei.c
> > @@ -3782,7 +3782,15 @@ static int do_open(struct nameidata *nd,
> > +       /*
> > +        * This permission hook is different than fsnotify_open_perm() hook.
> > +        * This is a pre-content hook that is called without sb_writers held
> > +        * and after the file was truncated.
> > +        */
> > +       return fsnotify_file_area_perm(file, MAY_OPEN, &file->f_pos, 0);
> >  }
>
> Stop adding sh*t like this to the VFS layer.
>
> Seriously. I spend time and effort looking at profiles, and then
> people who do *not* seem to spend the time and effort just willy nilly
> add fsnotify and security events and show down basic paths.
>
> I'm going to NAK any new fsnotify and permission hooks unless people
> show that they don't add any overhead.
>

FWIW we did work to eliminate overhead long before posting the new hooks.

This prep work has already been merged back in v6.9:
https://lore.kernel.org/linux-fsdevel/20240317184154.1200192-1-amir73il@gmail.com/

After working closely with Oliver and kernel test robot to eliminate
the overhead
of the pre-content hooks and on the way, also reduce overhead of the existing
fanotify permission hooks that were merged back in the day.

> Because I'm really really tired of having to wade through various
> permission hooks in the profiles that I can not fix or optimize,
> because those hoosk have no sane defined semantics, just "let user
> space know".
>
> Yes, right now it's mostly just the security layer. But this really
> looks to me like now fsnotify will be the same kind of pain.
>
> And that location is STUPID. Dammit, it is even *documented* to be
> stupid. It's a "pre-content" hook that happens after the contents have
> already been truncated. WTF? That's no "pre".
>

Yeh, I understand why it seems stupid and it may have been poorly
documented, but there is actually a good reason for the location of
this hook.

The problem with the existing fsnotify_open_perm() hook is that
with O_CREATE, open_last_lookups() takes sb_writers freeze
protection (regardless if file with really created), so we cannot
safely use this hook to start writing to file and fill its content.

So the important part of the comment is:
"This permission hook is different than fsnotify_open_perm() hook.
 This is a pre-content hook that is called without sb_writers held"

The part that follows "and after the file was truncated", simply
means that in case of O_TRUNC we won't need to fill file content,
because the file will have zero size anyway.

And for the record, it is not "pre-open" it is "pre-content-access", so the
name may be confusing but it is not wrong.

> I tried to follow the deep chain of inlines to see what actually ends
> up happening, and it looks like if the *whole* filesystem has no
> notify events at all, the fsnotify_sb_has_watchers() check will make
> this mostly go away, except for all the D$ accesses needed just to
> check for it.
>
> But even *one* entirely unrelated event will now force every single
> open to typically call __fsnotify_parent() (or possibly "just"
> fsnotify), because there's no sane "nobody cares about this dentry"
> kind of thing.
>
> So effectively this is a new hook that gets called on every single
> open call that nobody else cares about than you, and that people have
> lived without for three decades.
>

That's exactly the reason for the commit
a5e57b4d370c fsnotify: optimize the case of no permission event watchers

It is supposed to reduce overhead to bare minimum for hooks of
events from the class "permission" (FSNOTIFY_PRIO_CONTENT), which
are typically only watched for Anti-malware software, so
fsnotify_sb_has_priority_watchers(sb, FSNOTIFY_PRIO_CONTENT)
is typically false on any given fs.

And same for the new hooks for events of class "pre-content".
fsnotify_sb_has_priority_watchers(sb, FSNOTIFY_PRIO_PRE_CONTENT)
will only be true for fs of dedicated systems that run an HSM service, where
the overhead of the hooks is not going to be a concern.

> Stop it, or at least add the code to not do this all completely pointlessly.
>
> Because otherwise I will not take this kind of stuff any more. I just
> spent time trying to figure out how to avoid the pointless cache
> misses we did for every path component traversal.
>
> So I *really* don't want to see another pointless stupid fsnotify hook
> in my profiles.

I understand your frustration from crawling performance regressions, I
really do.
I am personally very grateful for the services of Oliver and his
kernel test robot
who test my development branches to catch regressions very soon in the
development
process.

We may have missed things along the way and you may yet find more issues
that justify another NAK, or more work, but you should know that a lot of care
was taken to try to avoid inflicting pain on the system.

I have been practically doing vfs prep and cleanup work together with
Josef and Jan and Christian for at least a year before we got to post v1 of the
pre-content patches.

So all I ask in return is that you consider the possibility that this is not
utter garbage and try to work with us to address your concerns.

Pre-content hooks (and HSM) is a very powerful feature that many people
are requesting and I believe that we will be able to deliver this
feature without
crapping all over the VFS.

Thanks,
Amir.


  parent reply	other threads:[~2024-11-11 23:36 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-11-11 20:17 [PATCH v6 00/17] fanotify: add pre-content hooks Josef Bacik
2024-11-11 20:17 ` [PATCH v6 01/17] fanotify: don't skip extra event info if no info_mode is set Josef Bacik
2024-11-11 20:17 ` [PATCH v6 02/17] fanotify: rename a misnamed constant Josef Bacik
2024-11-11 20:17 ` [PATCH v6 03/17] fanotify: reserve event bit of deprecated FAN_DIR_MODIFY Josef Bacik
2024-11-11 20:17 ` [PATCH v6 04/17] fsnotify: introduce pre-content permission events Josef Bacik
2024-11-11 20:17 ` [PATCH v6 05/17] fsnotify: pass optional file access range in pre-content event Josef Bacik
2024-11-11 20:17 ` [PATCH v6 06/17] fsnotify: generate pre-content permission event on open Josef Bacik
2024-11-11 21:51   ` Linus Torvalds
2024-11-11 22:46     ` Josef Bacik
2024-11-11 23:22       ` Linus Torvalds
2024-11-11 23:39         ` Linus Torvalds
2024-11-11 23:59         ` Amir Goldstein
2024-11-12  0:37           ` Linus Torvalds
2024-11-12  8:11             ` Amir Goldstein
2024-11-12 13:54               ` Jan Kara
2024-11-12 14:42                 ` Amir Goldstein
2024-11-12 14:28             ` Jan Kara
2024-11-12 15:24         ` Josef Bacik
2024-11-12 17:27           ` Linus Torvalds
2024-11-11 23:36     ` Amir Goldstein [this message]
2024-11-11 20:17 ` [PATCH v6 07/17] fsnotify: generate pre-content permission event on truncate Josef Bacik
2024-11-11 20:17 ` [PATCH v6 08/17] fanotify: introduce FAN_PRE_ACCESS permission event Josef Bacik
2024-11-11 20:17 ` [PATCH v6 09/17] fanotify: report file range info with pre-content events Josef Bacik
2024-11-11 20:17 ` [PATCH v6 10/17] fanotify: allow to set errno in FAN_DENY permission response Josef Bacik
2024-11-11 20:18 ` [PATCH v6 11/17] fanotify: add a helper to check for pre content events Josef Bacik
2024-11-11 20:18 ` [PATCH v6 12/17] fanotify: disable readahead if we have pre-content watches Josef Bacik
2024-11-11 20:18 ` [PATCH v6 13/17] mm: don't allow huge faults for files with pre content watches Josef Bacik
2024-11-11 20:18 ` [PATCH v6 14/17] fsnotify: generate pre-content permission event on page fault Josef Bacik
2024-11-11 20:18 ` [PATCH v6 15/17] xfs: add pre-content fsnotify hook for write faults Josef Bacik
2024-11-11 20:18 ` [PATCH v6 16/17] btrfs: disable defrag on pre-content watched files Josef Bacik
2024-11-11 20:18 ` [PATCH v6 17/17] fs: enable pre-content events on supported file systems Josef Bacik
2024-11-11 20:27 ` [PATCH v6 00/17] fanotify: add pre-content hooks Amir Goldstein
2024-11-11 21:55 ` Linus Torvalds

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAOQ4uxg0k4bGz6zOKS+Qt5BjEqDdUhvgG+5pLBPqSCcnQdffig@mail.gmail.com \
    --to=amir73il@gmail.com \
    --cc=brauner@kernel.org \
    --cc=jack@suse.cz \
    --cc=josef@toxicpanda.com \
    --cc=kernel-team@fb.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=oliver.sang@intel.com \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox