From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 508CCD41D69 for ; Tue, 12 Nov 2024 08:11:48 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 528908D0009; Tue, 12 Nov 2024 03:11:47 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 4D9058D0001; Tue, 12 Nov 2024 03:11:47 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 378E08D0009; Tue, 12 Nov 2024 03:11:47 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 14F948D0001 for ; Tue, 12 Nov 2024 03:11:47 -0500 (EST) Received: from smtpin10.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id B57634112D for ; Tue, 12 Nov 2024 08:11:46 +0000 (UTC) X-FDA: 82776722910.10.56E68AF Received: from mail-qk1-f179.google.com (mail-qk1-f179.google.com [209.85.222.179]) by imf25.hostedemail.com (Postfix) with ESMTP id EB32DA0007 for ; Tue, 12 Nov 2024 08:11:14 +0000 (UTC) Authentication-Results: imf25.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=muZ8VCIu; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf25.hostedemail.com: domain of amir73il@gmail.com designates 209.85.222.179 as permitted sender) smtp.mailfrom=amir73il@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1731398929; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=GYjdu3MIu3Eu/Ypiw444kSD1a+HXjbP7A8QKym9LvII=; b=27sqbZ/ICKCWAMu89/V+GudqCB4Ft2MTx1ZnmBzyJAR+HiEqHgQyo6jnA6YoOfi7KfXOMz CmEDN2mkbDCZjVSRSJEpSPaLldDdVzqILISWaHM6GjWVUoRUzfBz6D0rJupfCyfOyIk1cH VDI30D2oSYsuS1IOzZeimeYgDS9wpvA= ARC-Authentication-Results: i=1; imf25.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=muZ8VCIu; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf25.hostedemail.com: domain of amir73il@gmail.com designates 209.85.222.179 as permitted sender) smtp.mailfrom=amir73il@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1731398929; a=rsa-sha256; cv=none; b=MgMjqA+zWVq+ysa0Ru3EUTV/gxwq2oOlsZpeYDujRuSQ1UKTHxZQ7MC1blngRy4PcOvjxh OTOk3o4gMSGeMpsdIygl5QosYNBF5KeLbOFdmk4IdtRU/sXd4RTE0iL+QgTusJnSMeY0TV qXzdP17751y74zqen/fV6ebxO0kLEIc= Received: by mail-qk1-f179.google.com with SMTP id af79cd13be357-7b1539faa0bso348444785a.1 for ; Tue, 12 Nov 2024 00:11:44 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1731399104; x=1732003904; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=GYjdu3MIu3Eu/Ypiw444kSD1a+HXjbP7A8QKym9LvII=; b=muZ8VCIuLdvpLV0fyolz/+L6UtZYdHZSnz7n+53lxsWQUDBDgOA3xBDIKvRh3+NYFV Dzot4lSt6hrfI3nsptN01CGhP9bpDMO4SqpIbBIuara/6tYsAbMAhVgkmoQqZsJtdGpA CGDqtRcOT9U9v+rUDyIjY+XZAT10eFMF8BYVm5kye/vtYwNQxz7n7I+A+1QWRxS3JHmQ 8jZvkVL/zcY93245uipAqFe3MZe/tMo/x27fuj5X5Gtat3pBNC22OZFnigewYa/jUUi6 lCneYiCzcZXaKze95tl+tHObWfpySxfiv6EG+uxyOBW7sl4Z480rwpYM2S6IjKhpv/th k35Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1731399104; x=1732003904; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=GYjdu3MIu3Eu/Ypiw444kSD1a+HXjbP7A8QKym9LvII=; b=ZMfa/VBIqN+qaMZ0gKzCYGI06OlTEmlga4RXTehoZp03VExwUM2Pf1moOhWmoPZw5q bMaBOTzgCNbVSoMP8v33Q4LjslJ9hzwsp3ErGMn0qmNgZGJq+vfEYkCKv6Kkmn/j9rzx Lc972bmBM6s2ETRlF57xtPCeus58j12EcP8qd5r7fc/JbUKHzwaFPlwd/CRjxYBXf8OH j3KgHs18YWUfIrKmj/EeEIqqZYq6k1E5ARKvNZThLcPyN2KLOdLJyMoqHueSv6+ivRuV ExX50GJOKwvZC7/wHkCWOEmUWtjGcO+/1O3aWqk9vrAMr+mKr95FbK2B5N+ythZylDxr s4aQ== X-Forwarded-Encrypted: i=1; AJvYcCXilNXRbsLp3Ff0J/s9+zGxuboOO7/q+JDv/eXC+tT30OQspmHGtwEmZZ2jdDlfZkb9ddYJbXSZvQ==@kvack.org X-Gm-Message-State: AOJu0YyhsMLjruH+EqHMjLriawbCcmezlftclOsvb7x6vEBOSypdSf0m LTDYIZ6nSUTlggYvdlI3y+cGzP8I9evdzcgVBNn6sDnceMfG749wSpWqSnuJiMMB20sUrqurXVW ru47DunYFOszet+rp77zUDgW15Jw= X-Google-Smtp-Source: AGHT+IF8PhChDkQIPeKy040j30c4YpBejToxm9PopIZQRH7oO6q4EPvXD1ptT5PmED5D1+9XHwBRPyTqMw7gXqxqlng= X-Received: by 2002:a05:6214:2b92:b0:6ce:2f4e:40f1 with SMTP id 6a1803df08f44-6d39e1507famr237610156d6.26.1731399103800; Tue, 12 Nov 2024 00:11:43 -0800 (PST) MIME-Version: 1.0 References: In-Reply-To: From: Amir Goldstein Date: Tue, 12 Nov 2024 09:11:32 +0100 Message-ID: Subject: Re: [PATCH v6 06/17] fsnotify: generate pre-content permission event on open To: Linus Torvalds Cc: Josef Bacik , kernel-team@fb.com, linux-fsdevel@vger.kernel.org, jack@suse.cz, brauner@kernel.org, linux-xfs@vger.kernel.org, linux-btrfs@vger.kernel.org, linux-mm@kvack.org, linux-ext4@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: EB32DA0007 X-Stat-Signature: fk9wr8yuotxpz8bs78g4hy669xwjnhqe X-Rspam-User: X-HE-Tag: 1731399074-126753 X-HE-Meta: U2FsdGVkX19bYctdKwxC2Ih0UFQTDM/KCpksD1tS70pfzcOWibqgBJpUxXoikjkf2PPILRQEZxNBIAAMvEpU72Vn9ITiiFjyIUK5ubhN+MGoqlJrnI/6nW2FJEkTBfHhSQ6ziP5AfJdsd526Wjb9LjawBT4dATOdvSbUunmk3NiyyWVjrdkskuJtTbrvMyacm3v0zmE9IfT2e+i8znULg/LTWjsqCwFM+uCAbLxgIdy/klX8waFGVtZcfoCR2QVOEusflSrNHWBMrLGDf3DG8i3Q8vL+98YgH9Qla7ZWv8wS1gkpjZ3eNfENbf9FP1ZxZFiFebRUsDNgqsb4QgCM+oPvGamqQc7Beev0xEGQDKjzjQNx+lMum8Lzg7na2UV2+6dL1CmCJ1MD7d/vZi7lOTsPfpIJvtXliViJOdWjOVdFBuWaF3fMAb8AGT0OnJH9IJY+msWwZmdfiNLXVXCdHHWrqHYZAteDAmaA4X1I0B6DgCdLtpwZcscN6GC+Lg9Tk0rO8GlbFzR/HdwH8kuIKFweVci18rarbqQvx5zPxeKaZr2a2fACdKx54ta3XOXjOX9mHwbCXf4ZXxt/vjVRUp4RnOyvvflkQZFR94mKi2raQhQxwf5HXE+d2rPMvF3ik35rw/EYSWib6KTYV/47szKQG/lIgBwnDSI4WxZGSTRD5PywJrWQRqCNSCfA0F9DANKVZZTmKEEQIYh9suQpCNePyDbx06JDPKh3YmW/BxqYCGoQDGqFox6Xa2hnZRwpNCrHwZ0lr2+a1mmQWTIOA+kyBUF52I2ZA25grDAkw2chnmVZWchvgeqRhgmhfCCx2kVLaDw2ZM0bcnCUOJ892YoMobQSK3+I3HQvgV9q47lIYl+V/ZEkQvNIIb144fyL0rRUWlDCaERXtPG3hVTcWBRyCXnEMs2tmTf8c8qrCJLc1nZ3rYzdw3G6UaEJHn09xQ1MzQRKZl9W5yW4Nti AQHY1jFe nRhj1HNEW14ewpi8Zl0OFS0KR4Odp2YEUOl+sUOX5EhtD8arpbqc+M60Mc9GSkfODmcwGaKYMD0PKMeOpWozh1h0E37yXB4wT8ZMuG/nOaPbo4T+99XlrSi+CAkQiqT2OnAdAuMoqe3B5ZJGK3uuz7x93kAmm1Gv+HuM+nBiJsVMUXysXGUuWAm07IS8y07JamsLYHAD/4TAj0iCuF64li/8XtVgKJKXTA3zN2KlZgcK7fT4nNnJ2DeXStTHGum6xHe4LhPZMUGdrt/LxbBUdjqUKXtBoK20V9/AvpbZpmf/et+ZYGjpP+1nzzaUGcgj94YWF+waH/y5thyXZYQMqz5ydKh0MjhW7gNmS24auEkfyjvqGU/XXdK0mvNzlWbWeAPxgN+Qihuq2MGtTraaHrR7/9w== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, Nov 12, 2024 at 1:37=E2=80=AFAM Linus Torvalds wrote: > > On Mon, 11 Nov 2024 at 16:00, Amir Goldstein wrote: > > > > I think that's a good idea for pre-content events, because it's fine > > to say that if the sb/mount was not watched by a pre-content event list= ener > > at the time of file open, then we do not care. > > Right. > > > The problem is that legacy inotify/fanotify watches can be added after > > file is open, so that is allegedly why this optimization was not done f= or > > fsnotify hooks in the past. > > So honestly, even if the legacy fsnotify hooks can't look at the file > flag, they could damn well look at an inode flag. > Legacy fanotify has a mount watch (FAN_MARK_MOUNT), which is the common way for Anti-malware to set watches on filesystems, so I am not sure what you are saying. > And I'm not even convinced that we couldn't fix them to just look at a > file flag, and say "tough luck, somebody opened that file before you > started watching, you don't get to see what they did". That would specifically break tail -f (for inotify) and probably many other tools, but as long as we also look at the inode flags (i_fsnotify_mask) and the dentry flags (DCACHE_FSNOTIFY_PARENT_WATCHED), then I think we may be able to get away with changing the semantics for open files on a fanotify mount watch. Specifically, I would really like to eliminate completely the cost of FAN_ACCESS_PERM event, which could be gated on file flag, because this is only for security/Anti-malware and I don't think this event is practically useful and it sure does not need to guarantee permission events to mount watchers on already open files. > > So even if we don't look at a file->f_mode flag, the lergacy cases > should look at i_fsnotify_mask, and do that *first*. > > IOW, not do it like fsnotify_object_watched() does now, which is just > broken. Again, it looks at inode->i_sb->s_fsnotify_mask completely > pointlessly, but it also does it much too late - it gets called after > we've already called into the fsnotify() code and have messed up the > I$ etc. > > The "linode->i_sb->s_fsnotify_mask" is not only an extra indirection, > it should be very *literally* pointless. If some bit isn't set in > i_sb->s_fsnotify_mask, then there should be no way to set that bit in > inode->i_fsnotify_mask. So the only time we should access > i_sb->s_fsnotify_mask is when i_notify_mask gets *modified*, not when > it gets tested. > i_fsnotify_mask is the cumulative mask of all inode watchers s_fsnotify_mask is *not* the cumulative of all i_fsnotify_mask s_fsnotify_mask is the cumulative mask of all sb watchers mnt_fsnotify_marks is the cumulative mask of all mount watchers > But even if that silly and pointless i_sb->s_fsnotify_mask thing is > removed, fsnotify_object_watched() is *still* wrong, because it > requires that mnt_mask as an argument, which means that the caller now > has to look it up - all this entirely pointless work that should never > be done if the bit wasn't set in inode->i_fsnotify_mask. > > So I really think fsnotify is doing *everything* wrong. Note the difference between fsnotify_sb_has_watchers() and fsnotify_object_watched(). The former is an early optimization gate that checks if there are any inode/mount/sb watchers (per class) on the filesystem, regardless of specific events and specific target inode/file. We could possibly further optimize fsnotify_sb_has_watchers() to avoid access to ->s_fsnotify_info by setting sb flag (for each priority class). The latter checks if any inode/mount/sb are interested in a specific event on the said object. In upstream code, fsnotify_object_watched() is always gated behind fsnotify_sb_has_watchers(), which tries to prevent the indirect call. The new fsnotify_file_has_pre_content_watches() helper could have checked fsnotify_sb_has_priority_watchers(sb, FSNOTIFY_PRIO_CONTENT); but it is better to gate by file flag as you suggested. > > And I most certainly don't want to add more runtime hooks to > *critical* code like open/read/write. > > Right now, many of the fsnotify things are for "metadata", ie for > bigger file creation / removal / move etc. And yes, the "don't do this > if there are no fsnotify watchers AT ALL" does actually end up meaning > that most of the time I never see any of it in profiles, because the > fsnotify_sb_has_watchers() culls out that case. > > And while the fsnotify_sb_has_watchers() thing is broken garbage and > does too many indirections and is not testing the right thing, at > least it's inlined and you don't get the function calls. > > That doesn't make fsnotify "right", but at least it's not in my face. > I see the sb accesses, and I hate them, but it's usually at least > hidden. Admittedly not as well hidden as it *should* be, since it does > the access tests in the wrong order, but the old fsnotify_open() > doesn't strike me as "terminally broken". > > It doesn't have a permission test after the open has already done > things, and it's inlined enough that it isn't actively offensive. > > And most of the other fsnotify things have the same pattern - not > great, but not actively offensive. > > These new patches make it in my face. > > So I do require that the *new* cases at least get it right. The fact > that we have old code that is misdesigned and gets it wrong and should > also be improved isn't an excuse to add *more* badly coded stuff. > > And yes, if somebody fixes the old fsnotify stuff to check just the > i_fsnotify_mask in the inline function, and moves all the other silly > checks out-of-line, that would be an improvement. I'd very much > applaud that. But it's a separate thing from adding new hooks. We will do both. Thanks, Amir.