linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Linus Torvalds <torvalds@linux-foundation.org>
To: Josef Bacik <josef@toxicpanda.com>
Cc: kernel-team@fb.com, linux-fsdevel@vger.kernel.org, jack@suse.cz,
	 amir73il@gmail.com, brauner@kernel.org,
	linux-xfs@vger.kernel.org,  linux-btrfs@vger.kernel.org,
	linux-mm@kvack.org, linux-ext4@vger.kernel.org
Subject: Re: [PATCH v6 06/17] fsnotify: generate pre-content permission event on open
Date: Tue, 12 Nov 2024 09:27:35 -0800	[thread overview]
Message-ID: <CAHk-=wgtGE9ZvqwJo8sUzX4Vzkke5O_sTFTGdQNwH1+TxOCyYQ@mail.gmail.com> (raw)
In-Reply-To: <20241112152415.GA826972@perftesting>

On Tue, 12 Nov 2024 at 07:24, Josef Bacik <josef@toxicpanda.com> wrote:
>
> But this was an entirely inappropriate way to communicate your point, even with
> people who have been here a while and are used to being yelled at.

Fair enough.

I was probably way too frustrated, because I spent some of the last
few weeks literally trying to remove two cache misses from the
permission checking path.

And those two cache misses were for features that are in POSIX made
worse because of uid translations infrastructure used for containers.

Those are both things that people actually *use* in major ways.
Admittedly the particular optimization was exactly to _not_ bother
with looking up the exact uid details when we could tell early that it
was all unnecessary, but the point was that this was to optimize
something real, something important, and not some special case.

And here's an example of a profile I've been looking at (this is a
real load: the profile is literally my "make allmodconfig" build with
not a lot of actual rebuilding needed):

  1.30%  avc_has_perm_noaudit
  1.24%  selinux_inode_permission
  1.18%  link_path_walk.part.0.constprop.0
  0.99%  btrfs_getattr
  0.82%  clear_page_rep
  0.82%  security_inode_permission
  0.60%  dput
  0.60%  path_lookupat
  0.60%  __check_object_size
  0.54%  strncpy_from_user
  0.51%  inode_permission
  0.50%  generic_permission
  0.47%  kmem_cache_alloc_noprof
  0.47%  step_into
  0.46%  btrfs_permission
  0.45%  cp_new_stat
  0.44%  filename_lookup

and hey, you go "most of those are just around half a percent". Sure.
But add those things up, and you'll see that they add up to about 12%.

And yes, that 12% is 1/8th of the whole load. So it's not some "just
12% of the kernel footprint". It's literally 12% of one of the main
loads I run day-to-day, which is why I care about these paths so
deeply.

And look at the two top offenders. No, they aren't fsnotify. But guess
what they are? They are hooks in the VFS layer that the VFS code
cannot do anything about and cannot optimize as a result. They do
ABSOLUTELY NOTHING on this load, and they add no actual value.

If they had some kind of "I don't have any special security label"
test, the two top offenders (and down that list you can find a third
one) would likely just not exist at all. But they aren't "real" VFS
functions, they are just hooks into a black hole with random semantics
that is set by user-supplied tables, so we don't have that.

So this is why I'm putting my foot down. No more badly coded and badly
designed hooks that the VFS layer can't just optimize away for the
common case.

I can't fix the existing issues. But I can say "no more". If you want
new hooks, they had better have effectively *ZERO* overhead when they
aren't relevant.

And yes, I was too forceful. Sorry. But my foot stays down.

If you want a specialty hook for specialty uses, that hook needs to be
so *fundamentally* low-cost that I simply don't need to worry about
it.

It needs to have a flag that doesn't take a cache miss that says
"Nobody cares", and doesn't call out to random pointless functions
that cause I$ misses either.

I really wish I had made that requirement for the security people all
those years ago. Because 99% of the time, all that stupid selinux
noise is literally worthless and does nothing. But we don't have the
flag that says "nothing to see".

               Linus


  reply	other threads:[~2024-11-12 17:28 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-11-11 20:17 [PATCH v6 00/17] fanotify: add pre-content hooks Josef Bacik
2024-11-11 20:17 ` [PATCH v6 01/17] fanotify: don't skip extra event info if no info_mode is set Josef Bacik
2024-11-11 20:17 ` [PATCH v6 02/17] fanotify: rename a misnamed constant Josef Bacik
2024-11-11 20:17 ` [PATCH v6 03/17] fanotify: reserve event bit of deprecated FAN_DIR_MODIFY Josef Bacik
2024-11-11 20:17 ` [PATCH v6 04/17] fsnotify: introduce pre-content permission events Josef Bacik
2024-11-11 20:17 ` [PATCH v6 05/17] fsnotify: pass optional file access range in pre-content event Josef Bacik
2024-11-11 20:17 ` [PATCH v6 06/17] fsnotify: generate pre-content permission event on open Josef Bacik
2024-11-11 21:51   ` Linus Torvalds
2024-11-11 22:46     ` Josef Bacik
2024-11-11 23:22       ` Linus Torvalds
2024-11-11 23:39         ` Linus Torvalds
2024-11-11 23:59         ` Amir Goldstein
2024-11-12  0:37           ` Linus Torvalds
2024-11-12  8:11             ` Amir Goldstein
2024-11-12 13:54               ` Jan Kara
2024-11-12 14:42                 ` Amir Goldstein
2024-11-12 14:28             ` Jan Kara
2024-11-12 15:24         ` Josef Bacik
2024-11-12 17:27           ` Linus Torvalds [this message]
2024-11-11 23:36     ` Amir Goldstein
2024-11-11 20:17 ` [PATCH v6 07/17] fsnotify: generate pre-content permission event on truncate Josef Bacik
2024-11-11 20:17 ` [PATCH v6 08/17] fanotify: introduce FAN_PRE_ACCESS permission event Josef Bacik
2024-11-11 20:17 ` [PATCH v6 09/17] fanotify: report file range info with pre-content events Josef Bacik
2024-11-11 20:17 ` [PATCH v6 10/17] fanotify: allow to set errno in FAN_DENY permission response Josef Bacik
2024-11-11 20:18 ` [PATCH v6 11/17] fanotify: add a helper to check for pre content events Josef Bacik
2024-11-11 20:18 ` [PATCH v6 12/17] fanotify: disable readahead if we have pre-content watches Josef Bacik
2024-11-11 20:18 ` [PATCH v6 13/17] mm: don't allow huge faults for files with pre content watches Josef Bacik
2024-11-11 20:18 ` [PATCH v6 14/17] fsnotify: generate pre-content permission event on page fault Josef Bacik
2024-11-11 20:18 ` [PATCH v6 15/17] xfs: add pre-content fsnotify hook for write faults Josef Bacik
2024-11-11 20:18 ` [PATCH v6 16/17] btrfs: disable defrag on pre-content watched files Josef Bacik
2024-11-11 20:18 ` [PATCH v6 17/17] fs: enable pre-content events on supported file systems Josef Bacik
2024-11-11 20:27 ` [PATCH v6 00/17] fanotify: add pre-content hooks Amir Goldstein
2024-11-11 21:55 ` Linus Torvalds

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAHk-=wgtGE9ZvqwJo8sUzX4Vzkke5O_sTFTGdQNwH1+TxOCyYQ@mail.gmail.com' \
    --to=torvalds@linux-foundation.org \
    --cc=amir73il@gmail.com \
    --cc=brauner@kernel.org \
    --cc=jack@suse.cz \
    --cc=josef@toxicpanda.com \
    --cc=kernel-team@fb.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox