From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9EEFCD41C17 for ; Wed, 13 Nov 2024 08:51:11 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 0FC3A6B00B0; Wed, 13 Nov 2024 03:51:11 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 083EA6B00B1; Wed, 13 Nov 2024 03:51:11 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E65F56B00C2; Wed, 13 Nov 2024 03:51:10 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id C6BC86B00B0 for ; Wed, 13 Nov 2024 03:51:10 -0500 (EST) Received: from smtpin10.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 31DA2140928 for ; Wed, 13 Nov 2024 08:51:10 +0000 (UTC) X-FDA: 82780450956.10.5F8A271 Received: from mail-qt1-f170.google.com (mail-qt1-f170.google.com [209.85.160.170]) by imf06.hostedemail.com (Postfix) with ESMTP id 1964518000E for ; Wed, 13 Nov 2024 08:50:37 +0000 (UTC) Authentication-Results: imf06.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=PhiUgFIj; spf=pass (imf06.hostedemail.com: domain of amir73il@gmail.com designates 209.85.160.170 as permitted sender) smtp.mailfrom=amir73il@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1731487780; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=ybXyojIgfEt1kNw22/DBLy4j99prnhZqAaMZR54A+Gg=; b=GthhXDI2L/IAh6zGHneg6WgieOgxHYhb2Zn6YAtcXdj3FvTHZ2RulUq59ZmOhOP752ups1 wcgHo0VUtF7D4uuMqknVkrR+I8vmmbJWtKG2twsJLNa74R2isrZLhnOhJxI86kBc6m/1xc nIN1Zbg6XSE6aagL0RwBDs0VgxDaSyI= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1731487780; a=rsa-sha256; cv=none; b=vEuR6t1Lj8S4czazJDbawYXLTQTYmmdnvgTSzI5nIQNBdZHb+G0ZVa/OMiSJ3FgsTinrzQ o/lcYyAuemqMbR+9f4V63T++SYqcvFjS+gPyp1/spURxy3Gt22dVYYWQMIn11h64RB0VNX 04gCXQP2Cgdz5Qj7AwPwltlUe+7pvbU= ARC-Authentication-Results: i=1; imf06.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=PhiUgFIj; spf=pass (imf06.hostedemail.com: domain of amir73il@gmail.com designates 209.85.160.170 as permitted sender) smtp.mailfrom=amir73il@gmail.com; dmarc=pass (policy=none) header.from=gmail.com Received: by mail-qt1-f170.google.com with SMTP id d75a77b69052e-460af1a1154so44450481cf.0 for ; Wed, 13 Nov 2024 00:51:08 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1731487867; x=1732092667; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=ybXyojIgfEt1kNw22/DBLy4j99prnhZqAaMZR54A+Gg=; b=PhiUgFIjPIiRTTQmScuNAXeX0bIB1SxuUG6/c1oQNpVOSriJ5TOZ7gEWjHDd9AC4AE 0eeE+FJ/MRyCPn9zflJWHzvVKQDuzb2YkGo6iIyDnUMGTTvUrDGrulQYeluSuOzNXz/l zSSEDwW06lWhIdcDLMdHeQjltkD1+sMwWaGVPElPk4/JfUBQiL++StP2Eh4GKurMeN4q 59roGh9LIOnfa1agidbvu6QQeFVj/MG1VPHzR6EWU3mAbJswHjQecTjOE8GofgV1ZiQJ INKM5uJU2km1dS82uKbbA2Z4oehw7vBg6qcP30pixRQguBTal2ZBY9zfxZpSUzA/8uzH Ti5A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1731487867; x=1732092667; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=ybXyojIgfEt1kNw22/DBLy4j99prnhZqAaMZR54A+Gg=; b=rETziJdlGAnT2QT74MUVIp0oquXDbyblxKRddy7trSq9RYwwe/5ZTByzqo/YDXXenL m0TzkOMlVkgXsb5gau/oPytigKKuJT/GWC3dHuJW/uXyZtFxqXioQ+UoMIfRh74PnjPc tfhiTWByNB7GJdbLbuzWB6Ht8iU3msbpTtOmnBLc7Fe1e0kamSAEH09aiNAe0MlJ9YMS T2YLxf476EyWEKxFwd13ruZuHb4xEnZDb4LopuEksO5WNnYlHPBYEO8P+jpLEOWTSV1U vBbiSksTavv7U4udsBd5Qwn1QUhRsKmI0yGFpSHuWAtoQ7P1UPAADkAd/gU1s8mLreTZ 0DDQ== X-Forwarded-Encrypted: i=1; AJvYcCXPGYCGzDqpvUTh8IQ4MiPfpKN59QGV3H4PbFpPzvustlix7+upWTXAOQrUxm7XOiLITSCJSFazHA==@kvack.org X-Gm-Message-State: AOJu0YyGyGzY06hBxQ6zFKeRjvqJJAUTJIY5LB7Md6gvJ3YMoBYVZoA3 GX0ur7W+t71RzQk/Zf/4zXE79dQL0vM2X3FLEUCy5f2bM5VjEHBZgK3CRFAmd7Zaco8APpG4GY2 xcFLND8mqcX+ILjCM9VMzncBKwRE= X-Google-Smtp-Source: AGHT+IFBrfXQjTCkQPe4lwAc7PKO7knKqlZWmbapahrCNZ9X4yKTYmSvJNyTl2Dazx088ITWw0ve2s6iPmsqRoKVlow= X-Received: by 2002:a05:622a:17c5:b0:461:2150:d59c with SMTP id d75a77b69052e-4634b498484mr29491901cf.9.1731487867334; Wed, 13 Nov 2024 00:51:07 -0800 (PST) MIME-Version: 1.0 References: <141e2cc2dfac8b2f49c1c8d219dd7c20925b2cef.1731433903.git.josef@toxicpanda.com> <20241113001251.GF3387508@ZenIV> <20241113011954.GG3387508@ZenIV> <20241113043003.GH3387508@ZenIV> In-Reply-To: <20241113043003.GH3387508@ZenIV> From: Amir Goldstein Date: Wed, 13 Nov 2024 09:50:55 +0100 Message-ID: Subject: Re: [PATCH v7 05/18] fsnotify: introduce pre-content permission events To: Al Viro Cc: Linus Torvalds , Josef Bacik , kernel-team@fb.com, linux-fsdevel@vger.kernel.org, jack@suse.cz, brauner@kernel.org, linux-xfs@vger.kernel.org, linux-btrfs@vger.kernel.org, linux-mm@kvack.org, linux-ext4@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam10 X-Stat-Signature: xgug4m55jo3jo8w5nrhp85tsrxfepmgk X-Rspamd-Queue-Id: 1964518000E X-Rspam-User: X-HE-Tag: 1731487837-871116 X-HE-Meta: U2FsdGVkX18us/XrnjPaJ1UHCURD9kdgnf7oxV6WWBPzQGFBnERtbMPd+zUyaYE4peN0v5I7WgVVYrvhcsqb2nfbo7bIa2cYxaVRz4fWTxnsSoDAF/XZ4JuIYV6jhQcRNdm4ZhmEQvjaZpcyxSkmy/3yFA2mRBMs/99qA0/HNAWFyhB5QixmezY82OtW0f2WQCGhgnFsKakqJFHoCKEhQT+b96iDNlvCMNVhEslE3L1KG2A2/kT+I1F9RMtRX3mdIN645HaEI/ffQ4nTLU9FuvGtW9BSqUHvr78P+w4udr6gVdgvDVo6HlZfQgZzaI0ABr5EOG/6OUfMS85BwiJ5d+zDSoGKgGOaW2lDCkD3JasBWvTz1VuMsx25DzWpe47vOwmDnumQR8blX1lF+eLPX++UtpBVicmYYab7SS1T8Mclfb0482GTZXRx6zC3TN6AY98wrYxnirFeiAZwZEfTHBm6nt38IkeV2NAINjPjF/GEx/CCxku8Qzp3TH5GKMBOzbn65GM3GlDYUgQWwozv5mG9ji2gp3BmDN3nZsYF1nskHAYnrv6opM0s00PC4yhIHwzJOOM6mrtiHAGhI2iuuK8b85441WECAdQbYD4mbWiEqNLNQ5Fr6habuFUj7BnqvYMqQFeth1gfd7kbqtiFW+558m2w7HPtdaSvN8UYr3SrEC9lhtS8wTOii0z7E+CjwUOs3XBSqD0JCe3wFMHHQ2phLQWtaq7ebGzurTjVAwbaR7t77xSoatccUssI8N111UmC8CjjqCLF9mcxaRPYuLuh1GuFLBjFqpe9p8+cEPBMkuQ/+au1rd4qm/ZgMUyDC01Ue8iDx3BzqFvgrCmIGqy4TKAxKggXoarH6NlEvfSTR36H4TYDW/PisRmqRXCJp0peZdf5sAVfT89yxchGt+WfvIF6alGlnQukB8oJpVFU90vVAWiQdaEgHPV9FvYSEOceB/r148dFMnPzUZH usJtXJVm KYTtC+IHGWOAIbj88BQdFgrITP224N3mlzD5HoMV3ClOyrIum0I0sEKNaKnvG+6SFW1sveb67LlLdqkDklgbbQxSlC6ozoFpqbFQNLF/I9jfpQIPmniQK4yLFjVLtx1zruu5pnrjVSrFTOnJSe+LNFPfQHDjAyxXqMbxCWsM2H7Cd5kAHFCeemZjKZkjXwIxmY/0euof/Fr+PV+hMtqI/4lE+pUmcIOmodZdJR0dqTpu6zh2eQhOu7880GSnEIho0egZ9bJRY22sUSKWA11dhCuodKiNsNW2an2KdmuqCBROcAyo6VQ9cn+MYJ1XIHusvstp0mX6jIOEv/hoGgg76X2X0tj9GccYxKgRk1n0uq7vRRkWHaPVZi8+XSp7qfTF+MxJz X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, Nov 13, 2024 at 5:30=E2=80=AFAM Al Viro w= rote: > > On Wed, Nov 13, 2024 at 01:19:54AM +0000, Al Viro wrote: > > On Tue, Nov 12, 2024 at 04:38:42PM -0800, Linus Torvalds wrote: > > > Looking at that locking code in fadvise() just for the f_mode use doe= s > > > make me think this would be a really good cleanup. > > > > > > I note that our fcntl code seems buggy as-is, because while it does > > > use f_lock for assignments (good), it clearly does *not* use them for > > > reading. > > > > > > So it looks like you can actually read inconsistent values. > > > > > > I get the feeling that f_flags would want WRITE_ONCE/READ_ONCE in > > > _addition_ to the f_lock use it has. > > > > AFAICS, fasync logics is the fishy part - the rest should be sane. > > > > > The f_mode thing with fadvise() smells like the same bug. Just becaus= e > > > the modifications are serialized wrt each other doesn't mean that > > > readers are then automatically ok. > > > > Reads are also under ->f_lock in there, AFAICS... > > > > Another thing in the vicinity is ->f_mode modifications after the calls > > of anon_inode_getfile() in several callers - probably ought to switch > > those to anon_inode_getfile_fmode(). That had been discussed back in > > April when the function got merged, but "convert to using it" followup > > series hadn't materialized... > > While we are at it, there's is a couple of kludges I really hate - > mixing __FMODE_NONOTIFY and __FMODE_EXEC with O_... flags. > > E.g. for __FMODE_NONOTIFY all it takes is switching fanotify from > anon_inode_getfd() to anon_inode_getfile_fmode() and adding > a dentry_open_nonotify() to be used by fanotify on the other path. > That's it - no more weird shit in OPEN_FMODE(), etc. > > For __FMODE_EXEC it might get trickier (nfs is the main consumer), > but I seriously suspect that something like "have path_openat() > check op->acc_mode & MAY_EXEC and set FMODE_EXEC in ->f_mode > right after struct file allocation" would make a good starting > point; yes, it would affect uselib(2), but... I've no idea whether > it wouldn't be the right thing to do; would be hard to test. > > Anyway, untested __FMODE_NONOTIFY side of it: > > diff --git a/fs/fcntl.c b/fs/fcntl.c > index 22dd9dcce7ec..ebd1c82bfb6b 100644 > --- a/fs/fcntl.c > +++ b/fs/fcntl.c > @@ -1161,10 +1161,10 @@ static int __init fcntl_init(void) > * Exceptions: O_NONBLOCK is a two bit define on parisc; O_NDELAY > * is defined as O_NONBLOCK on some platforms and not on others. > */ > - BUILD_BUG_ON(21 - 1 /* for O_RDONLY being 0 */ !=3D > + BUILD_BUG_ON(20 - 1 /* for O_RDONLY being 0 */ !=3D > HWEIGHT32( > (VALID_OPEN_FLAGS & ~(O_NONBLOCK | O_NDELAY)) | > - __FMODE_EXEC | __FMODE_NONOTIFY)); > + __FMODE_EXEC)); > > fasync_cache =3D kmem_cache_create("fasync_cache", > sizeof(struct fasync_struct), 0, > diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fano= tify_user.c > index 9644bc72e457..43fbf29ef03a 100644 > --- a/fs/notify/fanotify/fanotify_user.c > +++ b/fs/notify/fanotify/fanotify_user.c > @@ -101,8 +101,7 @@ static void __init fanotify_sysctls_init(void) > * > * Internal and external open flags are stored together in field f_flags= of > * struct file. Only external open flags shall be allowed in event_f_fla= gs. > - * Internal flags like FMODE_NONOTIFY, FMODE_EXEC, FMODE_NOCMTIME shall = be > - * excluded. > + * Internal flags like FMODE_EXEC shall be excluded. > */ > #define FANOTIFY_INIT_ALL_EVENT_F_BITS (= \ > O_ACCMODE | O_APPEND | O_NONBLOCK | \ > @@ -262,8 +261,8 @@ static int create_fd(struct fsnotify_group *group, co= nst struct path *path, > * we need a new file handle for the userspace program so it can = read even if it was > * originally opened O_WRONLY. > */ > - new_file =3D dentry_open(path, > - group->fanotify_data.f_flags | __FMODE_NON= OTIFY, > + new_file =3D dentry_open_nonotify(path, > + group->fanotify_data.f_flags, > current_cred()); > if (IS_ERR(new_file)) { > /* > @@ -1404,6 +1403,7 @@ SYSCALL_DEFINE2(fanotify_init, unsigned int, flags,= unsigned int, event_f_flags) > unsigned int fid_mode =3D flags & FANOTIFY_FID_BITS; > unsigned int class =3D flags & FANOTIFY_CLASS_BITS; > unsigned int internal_flags =3D 0; > + struct file *file; > > pr_debug("%s: flags=3D%x event_f_flags=3D%x\n", > __func__, flags, event_f_flags); > @@ -1472,7 +1472,7 @@ SYSCALL_DEFINE2(fanotify_init, unsigned int, flags,= unsigned int, event_f_flags) > (!(fid_mode & FAN_REPORT_NAME) || !(fid_mode & FAN_REPORT_FID= ))) > return -EINVAL; > > - f_flags =3D O_RDWR | __FMODE_NONOTIFY; > + f_flags =3D O_RDWR; > if (flags & FAN_CLOEXEC) > f_flags |=3D O_CLOEXEC; > if (flags & FAN_NONBLOCK) > @@ -1550,10 +1550,18 @@ SYSCALL_DEFINE2(fanotify_init, unsigned int, flag= s, unsigned int, event_f_flags) > goto out_destroy_group; > } > > - fd =3D anon_inode_getfd("[fanotify]", &fanotify_fops, group, f_fl= ags); > + fd =3D get_unused_fd_flags(flags); > if (fd < 0) > goto out_destroy_group; > > + file =3D anon_inode_getfile_fmode("[fanotify]", &fanotify_fops, g= roup, > + f_flags, FMODE_NONOTIFY); > + if (IS_ERR(file)) { > + fd =3D PTR_ERR(file); > + put_unused_fd(fd); > + goto out_destroy_group; > + } > + fd_install(fd, file); > return fd; > > out_destroy_group: > diff --git a/fs/open.c b/fs/open.c > index acaeb3e25c88..04cb581528ff 100644 > --- a/fs/open.c > +++ b/fs/open.c > @@ -1118,6 +1118,23 @@ struct file *dentry_open(const struct path *path, = int flags, > } > EXPORT_SYMBOL(dentry_open); > > +struct file *dentry_open_nonotify(const struct path *path, int flags, > + const struct cred *cred) > +{ > + struct file *f =3D alloc_empty_file(flags, cred); > + if (!IS_ERR(f)) { > + int error; > + > + f->f_mode |=3D FMODE_NONOTIFY; > + error =3D vfs_open(path, f); > + if (error) { > + fput(f); > + f =3D ERR_PTR(error); > + } > + } > + return f; > +} > + > /** > * dentry_create - Create and open a file > * @path: path to create > @@ -1215,7 +1232,7 @@ inline struct open_how build_open_how(int flags, um= ode_t mode) > inline int build_open_flags(const struct open_how *how, struct open_flag= s *op) > { > u64 flags =3D how->flags; > - u64 strip =3D __FMODE_NONOTIFY | O_CLOEXEC; > + u64 strip =3D O_CLOEXEC; > int lookup_flags =3D 0; > int acc_mode =3D ACC_MODE(flags); > > diff --git a/include/linux/fs.h b/include/linux/fs.h > index e3c603d01337..18888d601550 100644 > --- a/include/linux/fs.h > +++ b/include/linux/fs.h > @@ -2731,6 +2731,8 @@ struct file *dentry_open(const struct path *path, i= nt flags, > struct file *dentry_create(const struct path *path, int flags, umode_t m= ode, > const struct cred *cred); > struct path *backing_file_user_path(struct file *f); > +struct file *dentry_open_nonotify(const struct path *path, int flags, > + const struct cred *creds); > > /* > * When mmapping a file on a stackable filesystem (e.g., overlayfs), the= file > @@ -3620,11 +3622,9 @@ struct ctl_table; > int __init list_bdev_fs_names(char *buf, size_t size); > > #define __FMODE_EXEC ((__force int) FMODE_EXEC) > -#define __FMODE_NONOTIFY ((__force int) FMODE_NONOTIFY) > > #define ACC_MODE(x) ("\004\002\006\006"[(x)&O_ACCMODE]) > -#define OPEN_FMODE(flag) ((__force fmode_t)(((flag + 1) & O_ACCMODE) | \ > - (flag & __FMODE_NONOTIFY))) > +#define OPEN_FMODE(flag) ((__force fmode_t)(((flag + 1) & O_ACCMODE))) > > static inline bool is_sxid(umode_t mode) > { > diff --git a/include/uapi/asm-generic/fcntl.h b/include/uapi/asm-generic/= fcntl.h > index 80f37a0d40d7..613475285643 100644 > --- a/include/uapi/asm-generic/fcntl.h > +++ b/include/uapi/asm-generic/fcntl.h > @@ -6,7 +6,6 @@ > > /* > * FMODE_EXEC is 0x20 > - * FMODE_NONOTIFY is 0x4000000 > * These cannot be used by userspace O_* until internal and external ope= n > * flags are split. > * -Eric Paris Nice. I will take it for a test drive. Thanks, Amir.