From: Amir Goldstein <amir73il@gmail.com>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Josef Bacik <josef@toxicpanda.com>,
kernel-team@fb.com, linux-fsdevel@vger.kernel.org, jack@suse.cz,
brauner@kernel.org, linux-xfs@vger.kernel.org,
linux-btrfs@vger.kernel.org, linux-mm@kvack.org,
linux-ext4@vger.kernel.org
Subject: Re: [PATCH v6 06/17] fsnotify: generate pre-content permission event on open
Date: Tue, 12 Nov 2024 09:11:32 +0100 [thread overview]
Message-ID: <CAOQ4uxh7aT+EvWYMa9v=SyRjfdh4Je_FmS0+TNqonHE5Z+_TPw@mail.gmail.com> (raw)
In-Reply-To: <CAHk-=wgUV27XF8g23=aWNJecRbn8fCDDW2=10y9yJ122+d8JrA@mail.gmail.com>
On Tue, Nov 12, 2024 at 1:37 AM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> On Mon, 11 Nov 2024 at 16:00, Amir Goldstein <amir73il@gmail.com> wrote:
> >
> > I think that's a good idea for pre-content events, because it's fine
> > to say that if the sb/mount was not watched by a pre-content event listener
> > at the time of file open, then we do not care.
>
> Right.
>
> > The problem is that legacy inotify/fanotify watches can be added after
> > file is open, so that is allegedly why this optimization was not done for
> > fsnotify hooks in the past.
>
> So honestly, even if the legacy fsnotify hooks can't look at the file
> flag, they could damn well look at an inode flag.
>
Legacy fanotify has a mount watch (FAN_MARK_MOUNT),
which is the common way for Anti-malware to set watches on
filesystems, so I am not sure what you are saying.
> And I'm not even convinced that we couldn't fix them to just look at a
> file flag, and say "tough luck, somebody opened that file before you
> started watching, you don't get to see what they did".
That would specifically break tail -f (for inotify) and probably many other
tools, but as long as we also look at the inode flags (i_fsnotify_mask)
and the dentry flags (DCACHE_FSNOTIFY_PARENT_WATCHED),
then I think we may be able to get away with changing the semantics
for open files on a fanotify mount watch.
Specifically, I would really like to eliminate completely the cost of
FAN_ACCESS_PERM event, which could be gated on file flag, because
this is only for security/Anti-malware and I don't think this event is
practically
useful and it sure does not need to guarantee permission events to mount
watchers on already open files.
>
> So even if we don't look at a file->f_mode flag, the lergacy cases
> should look at i_fsnotify_mask, and do that *first*.
>
> IOW, not do it like fsnotify_object_watched() does now, which is just
> broken. Again, it looks at inode->i_sb->s_fsnotify_mask completely
> pointlessly, but it also does it much too late - it gets called after
> we've already called into the fsnotify() code and have messed up the
> I$ etc.
>
> The "linode->i_sb->s_fsnotify_mask" is not only an extra indirection,
> it should be very *literally* pointless. If some bit isn't set in
> i_sb->s_fsnotify_mask, then there should be no way to set that bit in
> inode->i_fsnotify_mask. So the only time we should access
> i_sb->s_fsnotify_mask is when i_notify_mask gets *modified*, not when
> it gets tested.
>
i_fsnotify_mask is the cumulative mask of all inode watchers
s_fsnotify_mask is *not* the cumulative of all i_fsnotify_mask
s_fsnotify_mask is the cumulative mask of all sb watchers
mnt_fsnotify_marks is the cumulative mask of all mount watchers
> But even if that silly and pointless i_sb->s_fsnotify_mask thing is
> removed, fsnotify_object_watched() is *still* wrong, because it
> requires that mnt_mask as an argument, which means that the caller now
> has to look it up - all this entirely pointless work that should never
> be done if the bit wasn't set in inode->i_fsnotify_mask.
>
> So I really think fsnotify is doing *everything* wrong.
Note the difference between fsnotify_sb_has_watchers()
and fsnotify_object_watched().
The former is an early optimization gate that checks if there are
any inode/mount/sb watchers (per class) on the filesystem, regardless
of specific events and specific target inode/file.
We could possibly further optimize fsnotify_sb_has_watchers() to avoid
access to ->s_fsnotify_info by setting sb flag (for each priority class).
The latter checks if any inode/mount/sb are interested in a specific
event on the said object.
In upstream code, fsnotify_object_watched() is always gated behind
fsnotify_sb_has_watchers(), which tries to prevent the indirect call.
The new fsnotify_file_has_pre_content_watches() helper could
have checked fsnotify_sb_has_priority_watchers(sb,
FSNOTIFY_PRIO_CONTENT);
but it is better to gate by file flag as you suggested.
>
> And I most certainly don't want to add more runtime hooks to
> *critical* code like open/read/write.
>
> Right now, many of the fsnotify things are for "metadata", ie for
> bigger file creation / removal / move etc. And yes, the "don't do this
> if there are no fsnotify watchers AT ALL" does actually end up meaning
> that most of the time I never see any of it in profiles, because the
> fsnotify_sb_has_watchers() culls out that case.
>
> And while the fsnotify_sb_has_watchers() thing is broken garbage and
> does too many indirections and is not testing the right thing, at
> least it's inlined and you don't get the function calls.
>
> That doesn't make fsnotify "right", but at least it's not in my face.
> I see the sb accesses, and I hate them, but it's usually at least
> hidden. Admittedly not as well hidden as it *should* be, since it does
> the access tests in the wrong order, but the old fsnotify_open()
> doesn't strike me as "terminally broken".
>
> It doesn't have a permission test after the open has already done
> things, and it's inlined enough that it isn't actively offensive.
>
> And most of the other fsnotify things have the same pattern - not
> great, but not actively offensive.
>
> These new patches make it in my face.
>
> So I do require that the *new* cases at least get it right. The fact
> that we have old code that is misdesigned and gets it wrong and should
> also be improved isn't an excuse to add *more* badly coded stuff.
>
> And yes, if somebody fixes the old fsnotify stuff to check just the
> i_fsnotify_mask in the inline function, and moves all the other silly
> checks out-of-line, that would be an improvement. I'd very much
> applaud that. But it's a separate thing from adding new hooks.
We will do both.
Thanks,
Amir.
next prev parent reply other threads:[~2024-11-12 8:11 UTC|newest]
Thread overview: 33+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-11-11 20:17 [PATCH v6 00/17] fanotify: add pre-content hooks Josef Bacik
2024-11-11 20:17 ` [PATCH v6 01/17] fanotify: don't skip extra event info if no info_mode is set Josef Bacik
2024-11-11 20:17 ` [PATCH v6 02/17] fanotify: rename a misnamed constant Josef Bacik
2024-11-11 20:17 ` [PATCH v6 03/17] fanotify: reserve event bit of deprecated FAN_DIR_MODIFY Josef Bacik
2024-11-11 20:17 ` [PATCH v6 04/17] fsnotify: introduce pre-content permission events Josef Bacik
2024-11-11 20:17 ` [PATCH v6 05/17] fsnotify: pass optional file access range in pre-content event Josef Bacik
2024-11-11 20:17 ` [PATCH v6 06/17] fsnotify: generate pre-content permission event on open Josef Bacik
2024-11-11 21:51 ` Linus Torvalds
2024-11-11 22:46 ` Josef Bacik
2024-11-11 23:22 ` Linus Torvalds
2024-11-11 23:39 ` Linus Torvalds
2024-11-11 23:59 ` Amir Goldstein
2024-11-12 0:37 ` Linus Torvalds
2024-11-12 8:11 ` Amir Goldstein [this message]
2024-11-12 13:54 ` Jan Kara
2024-11-12 14:42 ` Amir Goldstein
2024-11-12 14:28 ` Jan Kara
2024-11-12 15:24 ` Josef Bacik
2024-11-12 17:27 ` Linus Torvalds
2024-11-11 23:36 ` Amir Goldstein
2024-11-11 20:17 ` [PATCH v6 07/17] fsnotify: generate pre-content permission event on truncate Josef Bacik
2024-11-11 20:17 ` [PATCH v6 08/17] fanotify: introduce FAN_PRE_ACCESS permission event Josef Bacik
2024-11-11 20:17 ` [PATCH v6 09/17] fanotify: report file range info with pre-content events Josef Bacik
2024-11-11 20:17 ` [PATCH v6 10/17] fanotify: allow to set errno in FAN_DENY permission response Josef Bacik
2024-11-11 20:18 ` [PATCH v6 11/17] fanotify: add a helper to check for pre content events Josef Bacik
2024-11-11 20:18 ` [PATCH v6 12/17] fanotify: disable readahead if we have pre-content watches Josef Bacik
2024-11-11 20:18 ` [PATCH v6 13/17] mm: don't allow huge faults for files with pre content watches Josef Bacik
2024-11-11 20:18 ` [PATCH v6 14/17] fsnotify: generate pre-content permission event on page fault Josef Bacik
2024-11-11 20:18 ` [PATCH v6 15/17] xfs: add pre-content fsnotify hook for write faults Josef Bacik
2024-11-11 20:18 ` [PATCH v6 16/17] btrfs: disable defrag on pre-content watched files Josef Bacik
2024-11-11 20:18 ` [PATCH v6 17/17] fs: enable pre-content events on supported file systems Josef Bacik
2024-11-11 20:27 ` [PATCH v6 00/17] fanotify: add pre-content hooks Amir Goldstein
2024-11-11 21:55 ` Linus Torvalds
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAOQ4uxh7aT+EvWYMa9v=SyRjfdh4Je_FmS0+TNqonHE5Z+_TPw@mail.gmail.com' \
--to=amir73il@gmail.com \
--cc=brauner@kernel.org \
--cc=jack@suse.cz \
--cc=josef@toxicpanda.com \
--cc=kernel-team@fb.com \
--cc=linux-btrfs@vger.kernel.org \
--cc=linux-ext4@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-xfs@vger.kernel.org \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox