From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2D045CE79B1 for ; Wed, 20 Sep 2023 09:57:08 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 58E6E6B013B; Wed, 20 Sep 2023 05:57:07 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 53E846B013C; Wed, 20 Sep 2023 05:57:07 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3DF9F6B013D; Wed, 20 Sep 2023 05:57:07 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 2E0786B013B for ; Wed, 20 Sep 2023 05:57:07 -0400 (EDT) Received: from smtpin15.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id F3A63B42BF for ; Wed, 20 Sep 2023 09:57:06 +0000 (UTC) X-FDA: 81256522452.15.D36371E Received: from sin.source.kernel.org (sin.source.kernel.org [145.40.73.55]) by imf17.hostedemail.com (Postfix) with ESMTP id 89C6740027 for ; Wed, 20 Sep 2023 09:57:04 +0000 (UTC) Authentication-Results: imf17.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b="EPGWRT/s"; spf=pass (imf17.hostedemail.com: domain of jlayton@kernel.org designates 145.40.73.55 as permitted sender) smtp.mailfrom=jlayton@kernel.org; dmarc=pass (policy=none) header.from=kernel.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1695203825; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=RcjoXb8jmGWqUTqcLYtXgUPWWRyJuDGDJCahKS0blkg=; b=lW44LnWrli9s8JJpahhUZUHZIYNKMNUOy3w5XgZ1wsuBlp58Fzn66tax1YJE1P7L2T8wat 0qQzUtgx3ayozmJpk6k5NUciJo4+EOBy4CjbJlmpO4cmjgmkJA1K/vB+BKOxv4TA7fgl1z t6+w83+lEFMCv+sH9shR7oXow26PDb8= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1695203825; a=rsa-sha256; cv=none; b=6BB2hC1Z3Wax2xg9kP1oesfcezXZ/atdQ2hFiwPPZr7wJjbq424hRGec4ZhK4ZC9FMpS0A SSdxmhq3iBUE2gEmkkr7vxKGGAbmRaYt0fowV5rAOkR7paKkwjm7ZJT1/SMFHU0eogjX9e hLUvzUV7PzrVS+hLJSigehL2YZKUnvg= ARC-Authentication-Results: i=1; imf17.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b="EPGWRT/s"; spf=pass (imf17.hostedemail.com: domain of jlayton@kernel.org designates 145.40.73.55 as permitted sender) smtp.mailfrom=jlayton@kernel.org; dmarc=pass (policy=none) header.from=kernel.org Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by sin.source.kernel.org (Postfix) with ESMTPS id 03F80CE1ABB; Wed, 20 Sep 2023 09:57:00 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id CFC4BC433C9; Wed, 20 Sep 2023 09:56:51 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1695203818; bh=8dU4x5PUmmjdLcmGD8FQZX0ooJMdCDj1v1oaGkGwbco=; h=Subject:From:To:Cc:Date:In-Reply-To:References:From; b=EPGWRT/sOx0WBr87v0UEJNUrhK9T7DN7zibPon6YUv0nH6uKKCHaF93+7R8+gnKNX s8+R/DpRsgYsXZTTTTkTUbu9q2cdTQuPIG8yC0uPYpHn7czUSEtbUHT1kUzmsDonLt D8LuESGvKMbqF+k045yKfSyWD/RK8JJiTAx+cKVPdVoGzB0EDGPX6DYZzUkqFy3gaw jQ65mPg83rqX/8MS6Uzw5HZJOTo2J0zNpacZuumWQKdC4SW0bLkosc4BfWuWAfNwT/ diU/SWDioSHIpTdCFeVaL2KiuesKs7kwpzSJRrNfhoKLfQjJa/PF1Npda25D3fz2gs ObMc7LQZoG48Q== Message-ID: <5ab880070c7d236928b90d9475a660cc0ab89c73.camel@kernel.org> Subject: Re: [PATCH v7 12/13] ext4: switch to multigrain timestamps From: Jeff Layton To: Christian Brauner Cc: Bruno Haible , Jan Kara , Xi Ruoyao , bug-gnulib@gnu.org, Alexander Viro , Eric Van Hensbergen , Latchesar Ionkov , Dominique Martinet , Christian Schoenebeck , David Howells , Marc Dionne , Chris Mason , Josef Bacik , David Sterba , Xiubo Li , Ilya Dryomov , Jan Harkes , coda@cs.cmu.edu, Tyler Hicks , Gao Xiang , Chao Yu , Yue Hu , Jeffle Xu , Namjae Jeon , Sungjong Seo , Jan Kara , Theodore Ts'o , Andreas Dilger , Jaegeuk Kim , OGAWA Hirofumi , Miklos Szeredi , Bo b Peterson , Andreas Gruenbacher , Greg Kroah-Hartman , Tejun Heo , Trond Myklebust , Anna Schumaker , Konstantin Komarov , Mark Fasheh , Joel Becker , Joseph Qi , Mike Marshall , Martin Brandenburg , Luis Chamberlain , Kees Cook , Iurii Zaikin , Steve French , Paulo Alcantara , Ronnie Sahlberg , Shyam Prasad N , Tom Talpey , Sergey Senozhatsky , Richard Weinberger , Hans de Goede , Hugh Dickins , Andrew Morton , Amir Goldstein , "Darrick J. Wong" , Benjamin Coddington , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, v9fs@lists.linux.dev, linux-afs@lists.infradead.org, linux-btrfs@vger.kernel.org, ceph-devel@vger.kernel.org, codalist@coda.cs.cmu.edu, ecryptfs@vger.kernel.org, linux-erofs@lists.ozlabs.org, linux-ext4@vger.kernel.org, linux-f2fs-devel@lists.sourceforge.net, cluster-devel@redhat.com, linux-nfs@vger.kernel.org, ntfs3@lists.linux.dev, ocfs2-devel@lists.linux.dev, devel@lists.orangefs.org, linux-cifs@vger.kernel.org, samba-technical@lists.samba.org, linux-mtd@lists.infradead.org, linux-mm@kvack.org, linux-unionfs@vger.kernel.org, linux-xfs@vger.kernel.org Date: Wed, 20 Sep 2023 05:56:50 -0400 In-Reply-To: <20230920-leerung-krokodil-52ec6cb44707@brauner> References: <20230807-mgctime-v7-0-d1dec143a704@kernel.org> <20230919110457.7fnmzo4nqsi43yqq@quack3> <1f29102c09c60661758c5376018eac43f774c462.camel@kernel.org> <4511209.uG2h0Jr0uP@nimes> <08b5c6fd3b08b87fa564bb562d89381dd4e05b6a.camel@kernel.org> <20230920-leerung-krokodil-52ec6cb44707@brauner> Content-Type: text/plain; charset="ISO-8859-15" Content-Transfer-Encoding: quoted-printable User-Agent: Evolution 3.48.4 (3.48.4-1.fc38) MIME-Version: 1.0 X-Stat-Signature: ox5czmk6guwq4fmixb587ymmrwjgsdwj X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: 89C6740027 X-Rspam-User: X-HE-Tag: 1695203824-840210 X-HE-Meta: U2FsdGVkX1+CQ+rY259AEXB7JPlr5rxVZBEF1CXMv0La8M22Epzmt4+HzcaJxWCN8EEmNgmmEOPvxsvS8sIsKcjNiLkHXPqoQjF4N58gOPIqjQLQISdfmPQfqbWlrKCLIdfDasi7chraXceHOXu5HQgn2KWO69OfRWwHdwrnt7FHGOZ7ScbYfwniOdbuZhFl+zqZmo83XpM6EOhL+tcL6ov0HCNCPH6zjAGexz3evsPLHCF3WSpIeFv0p5jOxHcvaCTWSkC/KfARFh1d9zjyvrGqnA4UjMZk/bqY4fqY2H/uRPUsmzz4KSdv2BWa0Pk4s2WsIaAdss9ywZb2HiOp1C1v3uZpivYXNbJMH7Mv0498FozfnKPimwgFmeYLaHa1Kr6pl4Sin4G7kF+OYb5r7pZQalW+sBooHh3kzvMypzNVCaOyyecz3a7kANkpOLzYZuk4yE0eKO4+8QnFQ2Q+398pmCNB6tkPGe1I1mWuM/QwuicRRK/D1DxDN66lBs+nCE267TZ0tRH8FXaVEU6B5DrM2LyFu0IQGeR4+PaYf5j5OK66w1Q2F+cxFNZH1jEmRPj3bD31EemjBkKgX2pzuFl81bxW+tSpfhPyJGCuW+vEZc7imlBLDjE6zHLMYH8af822Y5gmRsS6MErdTHEeacMDLZp97CRzZzY+cdQEEUe10Pws7CzkqkuWyR3rZ72LOAMviZ36ZG6HZDHJlDE7KHNInPdDi/lOyZ/dDVM2qujizXcMYG+F9Cirp8ikgnPZRvse/FVZ0uitwFLEZQ84MFrp5/pcGUyYCwkJB/cpZxnuFU9nlQWMjwr6JW608ORrn8/hJD37ReEfn/4oiEC8RrIuA28Bf1dhhx8MCsrMSOv7fmpSfpMX/uwtI43XPCLemScexKS6sEAbReP8nI41d3bojzivTaridWjE/NZkrYkRWoTIby5BK8CU+w5tPzDP9/3O8xX/jtf4vviudtJ Tx14/6ZN /wxJ0UwNxrwiMwPKplf1u3blcOhWPiAOt+PzpC87Yr4cSS9RbPNF30c7DgR7Lkqa8EduF9qgLzxLs4Y/NDS24iArTB1kXYUKqJlFNU6gTuPdo+tLftuVBLhIdEwZP/mKmcYntLZm+GJg6bIlT6qxUZ/9HP7lsM78cfnnAY8SknRf/2KA4xIlCOq6qfjZkGijh7iB/yqzAAshW46L+NnxlASsFIWSv8YwPgsRy/VnjYu5oqYCcQDs4HQHChu68PTMeWEcfDFAfkuMQAxCqZenSx4Zu5UuSN5PvDeW12txZuJlHuXvxAsT7jkwcjL4/qsTUkGgfrcIpoyQurUTngZ3T9zd1JCcVe8y0k1uh X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed, 2023-09-20 at 10:41 +0200, Christian Brauner wrote: > > > f1 was last written to *after* f2 was last written to. If the timesta= mp of f1 > > > is then lower than the timestamp of f2, timestamps are fundamentally = broken. > > >=20 > > > Many things in user-space depend on timestamps, such as build system > > > centered around 'make', but also 'find ... -newer ...'. > > >=20 > >=20 > >=20 > > What does breakage with make look like in this situation? The "fuzz" > > here is going to be on the order of a jiffy. The typical case for make > > timestamp comparisons is comparing source files vs. a build target. If > > those are being written nearly simultaneously, then that could be an > > issue, but is that a typical behavior? It seems like it would be hard t= o > > rely on that anyway, esp. given filesystems like NFS that can do lazy > > writeback. > >=20 > > One of the operating principles with this series is that timestamps can > > be of varying granularity between different files. Note that Linux > > already violates this assumption when you're working across filesystems > > of different types. > >=20 > > As to potential fixes if this is a real problem: > >=20 > > I don't really want to put this behind a mount or mkfs option (a'la > > relatime, etc.), but that is one possibility. > >=20 > > I wonder if it would be feasible to just advance the coarse-grained > > current_time whenever we end up updating a ctime with a fine-grained > > timestamp? It might produce some inode write amplification. Files that >=20 > Less than ideal imho. >=20 > If this risks breaking existing workloads by enabling it unconditionally > and there isn't a clear way to detect and handle these situations > without risk of regression then we should move this behind a mount > option. >=20 > So how about the following: >=20 > From cb14add421967f6e374eb77c36cc4a0526b10d17 Mon Sep 17 00:00:00 2001 > From: Christian Brauner > Date: Wed, 20 Sep 2023 10:00:08 +0200 > Subject: [PATCH] vfs: move multi-grain timestamps behind a mount option >=20 > While we initially thought we can do this unconditionally it turns out > that this might break existing workloads that rely on timestamps in very > specific ways and we always knew this was a possibility. Move > multi-grain timestamps behind a vfs mount option. >=20 > Signed-off-by: Christian Brauner > --- > =A0fs/fs_context.c | 18 ++++++++++++++++++ > =A0fs/inode.c | 4 ++-- > =A0fs/proc_namespace.c | 1 + > =A0fs/stat.c | 2 +- > =A0include/linux/fs.h | 4 +++- > =A05 files changed, 25 insertions(+), 4 deletions(-) >=20 > diff --git a/fs/fs_context.c b/fs/fs_context.c > index a0ad7a0c4680..dd4dade0bb9e 100644 > --- a/fs/fs_context.c > +++ b/fs/fs_context.c > @@ -44,6 +44,7 @@ static const struct constant_table common_set_sb_flag[]= =3D { > =A0 { "mand", SB_MANDLOCK }, > =A0 { "ro", SB_RDONLY }, > =A0 { "sync", SB_SYNCHRONOUS }, > + { "mgtime", SB_MGTIME }, > =A0 { }, > =A0}; > =A0 >=20 > @@ -52,18 +53,32 @@ static const struct constant_table common_clear_sb_fl= ag[] =3D { > =A0 { "nolazytime", SB_LAZYTIME }, > =A0 { "nomand", SB_MANDLOCK }, > =A0 { "rw", SB_RDONLY }, > + { "nomgtime", SB_MGTIME }, > =A0 { }, > =A0}; > =A0 >=20 > +static inline int check_mgtime(unsigned int token, const struct fs_conte= xt *fc) > +{ > + if (token !=3D SB_MGTIME) > + return 0; > + if (!(fc->fs_type->fs_flags & FS_MGTIME)) > + return invalf(fc, "Filesystem doesn't support multi-grain timestamps")= ; > + return 0; > +} > + > =A0/* > =A0=A0* Check for a common mount option that manipulates s_flags. > =A0=A0*/ > =A0static int vfs_parse_sb_flag(struct fs_context *fc, const char *key) > =A0{ > =A0 unsigned int token; > + int ret; > =A0 >=20 > =A0 token =3D lookup_constant(common_set_sb_flag, key, 0); > =A0 if (token) { > + ret =3D check_mgtime(token, fc); > + if (ret) > + return ret; > =A0 fc->sb_flags |=3D token; > =A0 fc->sb_flags_mask |=3D token; > =A0 return 0; > @@ -71,6 +86,9 @@ static int vfs_parse_sb_flag(struct fs_context *fc, con= st char *key) > =A0 >=20 > =A0 token =3D lookup_constant(common_clear_sb_flag, key, 0); > =A0 if (token) { > + ret =3D check_mgtime(token, fc); > + if (ret) > + return ret; > =A0 fc->sb_flags &=3D ~token; > =A0 fc->sb_flags_mask |=3D token; > =A0 return 0; > diff --git a/fs/inode.c b/fs/inode.c > index 54237f4242ff..fd1a2390aaa3 100644 > --- a/fs/inode.c > +++ b/fs/inode.c > @@ -2141,7 +2141,7 @@ EXPORT_SYMBOL(current_mgtime); > =A0 >=20 > =A0static struct timespec64 current_ctime(struct inode *inode) > =A0{ > - if (is_mgtime(inode)) > + if (IS_MGTIME(inode)) > =A0 return current_mgtime(inode); > =A0 return current_time(inode); > =A0} > @@ -2588,7 +2588,7 @@ struct timespec64 inode_set_ctime_current(struct in= ode *inode) > =A0 now =3D current_time(inode); > =A0 >=20 > =A0 /* Just copy it into place if it's not multigrain */ > - if (!is_mgtime(inode)) { > + if (!IS_MGTIME(inode)) { > =A0 inode_set_ctime_to_ts(inode, now); > =A0 return now; > =A0 } > diff --git a/fs/proc_namespace.c b/fs/proc_namespace.c > index 250eb5bf7b52..08f5bf4d2c6c 100644 > --- a/fs/proc_namespace.c > +++ b/fs/proc_namespace.c > @@ -49,6 +49,7 @@ static int show_sb_opts(struct seq_file *m, struct supe= r_block *sb) > =A0 { SB_DIRSYNC, ",dirsync" }, > =A0 { SB_MANDLOCK, ",mand" }, > =A0 { SB_LAZYTIME, ",lazytime" }, > + { SB_MGTIME, ",mgtime" }, > =A0 { 0, NULL } > =A0 }; > =A0 const struct proc_fs_opts *fs_infop; > diff --git a/fs/stat.c b/fs/stat.c > index 6e60389d6a15..2f18dd5de18b 100644 > --- a/fs/stat.c > +++ b/fs/stat.c > @@ -90,7 +90,7 @@ void generic_fillattr(struct mnt_idmap *idmap, u32 requ= est_mask, > =A0 stat->size =3D i_size_read(inode); > =A0 stat->atime =3D inode->i_atime; > =A0 >=20 > - if (is_mgtime(inode)) { > + if (IS_MGTIME(inode)) { > =A0 fill_mg_cmtime(stat, request_mask, inode); > =A0 } else { > =A0 stat->mtime =3D inode->i_mtime; > diff --git a/include/linux/fs.h b/include/linux/fs.h > index 4aeb3fa11927..03e415fb3a7c 100644 > --- a/include/linux/fs.h > +++ b/include/linux/fs.h > @@ -1114,6 +1114,7 @@ extern int send_sigurg(struct fown_struct *fown); > =A0#define SB_NODEV BIT(2) /* Disallow access to device special fi= les */ > =A0#define SB_NOEXEC BIT(3) /* Disallow program execution */ > =A0#define SB_SYNCHRONOUS BIT(4) /* Writes are synced at once */ > +#define SB_MGTIME BIT(5) /* Use multi-grain timestamps */ > =A0#define SB_MANDLOCK BIT(6) /* Allow mandatory locks on an FS */ > =A0#define SB_DIRSYNC BIT(7) /* Directory modifications are synchron= ous */ > =A0#define SB_NOATIME BIT(10) /* Do not update access times. */ > @@ -2105,6 +2106,7 @@ static inline bool sb_rdonly(const struct super_blo= ck *sb) { return sb->s_flags > =A0 ((inode)->i_flags & (S_SYNC|S_DIRSYNC))) > =A0#define IS_MANDLOCK(inode) __IS_FLG(inode, SB_MANDLOCK) > =A0#define IS_NOATIME(inode) __IS_FLG(inode, SB_RDONLY|SB_NOATIME) > +#define IS_MGTIME(inode) __IS_FLG(inode, SB_MGTIME) > =A0#define IS_I_VERSION(inode) __IS_FLG(inode, SB_I_VERSION) > =A0 >=20 > =A0#define IS_NOQUOTA(inode) ((inode)->i_flags & S_NOQUOTA) > @@ -2366,7 +2368,7 @@ struct file_system_type { > =A0=A0*/ > =A0static inline bool is_mgtime(const struct inode *inode) > =A0{ > - return inode->i_sb->s_type->fs_flags & FS_MGTIME; > + return inode->i_sb->s_flags & SB_MGTIME; > =A0} > =A0 >=20 > =A0extern struct dentry *mount_bdev(struct file_system_type *fs_type, The mount option looks reasonable. Thanks for throwing together the patch. Maybe in the future we can come up with a way to mitigate the problems and do this unconditionally? Reviewed-by: Jeff Layton