From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 666C5C7EE23 for ; Tue, 23 May 2023 12:46:15 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id AB71A900002; Tue, 23 May 2023 08:46:14 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id A66CE6B0075; Tue, 23 May 2023 08:46:14 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 92E9C900002; Tue, 23 May 2023 08:46:14 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 8547A6B0074 for ; Tue, 23 May 2023 08:46:14 -0400 (EDT) Received: from smtpin11.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 1E8711C7361 for ; Tue, 23 May 2023 12:46:14 +0000 (UTC) X-FDA: 80821492668.11.449073E Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.220.28]) by imf30.hostedemail.com (Postfix) with ESMTP id E65DD8001B for ; Tue, 23 May 2023 12:46:10 +0000 (UTC) Authentication-Results: imf30.hostedemail.com; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b=eOBxwWMb; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b=s3H7NU7J; spf=pass (imf30.hostedemail.com: domain of jack@suse.cz designates 195.135.220.28 as permitted sender) smtp.mailfrom=jack@suse.cz; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1684845971; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=D2nL3xyYPlMdtq9qs01YJNIyc/qfR7bb6+Vphl3LLao=; b=sXKz69OjGtIJ54S/+RmxqhSNWsBO7usli4k12xX+55nRIjirzmqRIEVIyntGQCO6sddz1y CQthhVTi8Q+tqad62QKWXyKZ4LkjhEdOVreph8gFBhQLm60nW5TF5TqePgt+B38O1m6w87 kCn8zTLDCO/ryAMzktSdiE9r3Jr/0dE= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1684845971; a=rsa-sha256; cv=none; b=rsHwv1ljw3n2Biud4SNbJFkIc0e9AKRNfe/+BtGMcUKSGDHfvZDl9rqrmpllGxqsufUtnL hAKfv1lSxcPeyzmztWPluq1JIorzTgJpEYnfLs3ftV7twyXZv3ZLq1Y0YIsF1bwrCG6irU J+KxwvhmqJuqf+CFTY043ci5H6Trjmw= ARC-Authentication-Results: i=1; imf30.hostedemail.com; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b=eOBxwWMb; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b=s3H7NU7J; spf=pass (imf30.hostedemail.com: domain of jack@suse.cz designates 195.135.220.28 as permitted sender) smtp.mailfrom=jack@suse.cz; dmarc=none Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id 3F71C22A61; Tue, 23 May 2023 12:46:09 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa; t=1684845969; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=D2nL3xyYPlMdtq9qs01YJNIyc/qfR7bb6+Vphl3LLao=; b=eOBxwWMb3EwllwDof3+xAu6djq6MmFwQorTzrxkBCEBmVr+ed4A3SGo6BLOuvPawqLNp8j +bLaynXcHXPaZ9hduIypIgdVXKOOjv8BKzR99NWAl5wWqkzGRzyyxyXBxy11ZGUNQ/XV9i 1RTWYrBE9v44zs3IXsZc7kq0jspZ7uI= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_ed25519; t=1684845969; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=D2nL3xyYPlMdtq9qs01YJNIyc/qfR7bb6+Vphl3LLao=; b=s3H7NU7JGOmO3C7j1CKkYSeoBVTvMWjfTQOo/Ja+pc5RBlLw2VM4u3aqEqD6VQCd6TXpp1 Fz805agjqIxvCfCg== Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id DF2EC13A10; Tue, 23 May 2023 12:46:08 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id IkZuNpC1bGSyMgAAMHmgww (envelope-from ); Tue, 23 May 2023 12:46:08 +0000 Received: by quack3.suse.cz (Postfix, from userid 1000) id 92B3CA075D; Tue, 23 May 2023 14:46:06 +0200 (CEST) Date: Tue, 23 May 2023 14:46:06 +0200 From: Jan Kara To: Jeff Layton Cc: Jan Kara , Alexander Viro , Christian Brauner , "Darrick J. Wong" , Hugh Dickins , Andrew Morton , Dave Chinner , Chuck Lever , Amir Goldstein , David Howells , Neil Brown , Matthew Wilcox , Andreas Dilger , Theodore T'so , Chris Mason , Josef Bacik , David Sterba , Namjae Jeon , Steve French , Sergey Senozhatsky , Tom Talpey , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-xfs@vger.kernel.org, linux-btrfs@vger.kernel.org, linux-ext4@vger.kernel.org, linux-mm@kvack.org, linux-nfs@vger.kernel.org, linux-cifs@vger.kernel.org Subject: Re: [PATCH v4 2/9] fs: add infrastructure for multigrain inode i_m/ctime Message-ID: <20230523124606.bkkhwi6b67ieeygl@quack3> References: <20230518114742.128950-1-jlayton@kernel.org> <20230518114742.128950-3-jlayton@kernel.org> <20230523100240.mgeu4y46friv7hau@quack3> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Rspamd-Queue-Id: E65DD8001B X-Rspam-User: X-Stat-Signature: d6hd1x9rfrwcd49n4djj4akk9d4jaxer X-Rspamd-Server: rspam03 X-HE-Tag: 1684845970-544285 X-HE-Meta: U2FsdGVkX1++9mkYHB5bRnoTfvSK9ItMnX+SPyrU9LXN2ZgngXZK2ll9dDcwVlMcPAGcAevpG3Sdg44l4JCb2mczFJJErGBp/ZCL8Bon8l3bJHwfnk5F8cTplHCrJBIccNmLBNwBo0eP7UUEuDm7OHjE4z6hWU/4EBS1uZBeVfCIl95lG5LFL5vASjLbBiogWWdOaCNtfRRWFVYeAT9rRqmnXx8Yd6fcuNviHrmFgljF3e8LLK+eern4qenqs8bstXBsg9512/nfQyn7i623oq87Om5mBWV9Two9idrJsOD71Ur9HWX14PYwLjkk7G6PdB0wtKJpRl6OLoMOblct77jR5yfayYyM6ETn5Kc4lFr7zufU4g8V3ni37Tzi3/y5/bWyUSaTqhaENGMWpBInqYqRFVbjKcfGyMOcOpQ31JfCzP2J1MUoZoYUFtwL0SbO9L6Skn9W+tKzFw0bXhsxGEahbuUGH1rjaGl7BAp4uvzDCIpV+K29zx1W47Y/pWbl/59yq0a+JTBLRBGSrLb4dsHdGieBwqm+PieV+XbLZU3hWQs3Q2gOfFgY+N1nc0Y5n+Lh84GBOTUdSG6aU+ftCRl1NGM3i9uAEUcs8Ua30D1Y+u1MP4/7A9UCRh1BPjheGCdIN4dH1Fq0kNA7Zc62Hol/wIbuKN5hkB96jB7WYM8pA85kzdqAb9xfcp/Nv1e5TbJ95V6YlgnF5YC8T5+3VYPd/8Cztu/QRhrk9CCjSv1jO+AS0dZEWRLvYvDyEOWcJqvISeTkItILQRslw1SXRnkLQRHZJ1sKnTaq8MT1+LLbFM8N3PgyFRTEXm4qMQTZySBR1aKCngzC+1cpgubKPVR40M2UxgzaaagTh9l67GAKNvBxqXRg5MQPLeKfUh/Vu4S3D74yyel+3489tQq5GO/JZYEHop8yi3ggeePiWm25MaUzXyceBwF6OA/mM0MCrlcS+MGP97fI9shGSA2 flQ73tIX UG+R7QGEgV+fyhaGK0JGK93ys2OSXgFDTl2tdQdVmKZP/OByoHo2Rdmve6t1LqQesfI+OBZoX8V5FfpdBk/5TqXT6xznM8dVXMhEtg1wSIqo9QHpDU9Q/pRA6xScPmgeVvMVycSAylkuOtXWyYMiHLmfS786Asr3+Tf5mfyU5VzYzwHZZadFgLB3GXhbAsEeBQwHvSLrkE1CwtVinwFGTo5iMdmBbVyZepsJmhvDXFqeSmaFUmuWirmkCBfci8fbQOJq3biYercAv+GVBN/MV6lUkfPADXfwToAu8091kD4xBkJAdVmTwxxIqdfWDIRmyTTYrgzbI8DIoUFvdsj/FK6CC5gZqVROlz5dFpiUw1yGvGy0= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue 23-05-23 06:40:08, Jeff Layton wrote: > On Tue, 2023-05-23 at 12:02 +0200, Jan Kara wrote: > > On Thu 18-05-23 07:47:35, Jeff Layton wrote: > > > The VFS always uses coarse-grained timestamp updates for filling out the > > > ctime and mtime after a change. This has the benefit of allowing > > > filesystems to optimize away a lot metadata updates, down to around 1 > > > per jiffy, even when a file is under heavy writes. > > > > > > Unfortunately, this has always been an issue when we're exporting via > > > NFSv3, which relies on timestamps to validate caches. Even with NFSv4, a > > > lot of exported filesystems don't properly support a change attribute > > > and are subject to the same problems with timestamp granularity. Other > > > applications have similar issues (e.g backup applications). > > > > > > Switching to always using fine-grained timestamps would improve the > > > situation, but that becomes rather expensive, as the underlying > > > filesystem will have to log a lot more metadata updates. > > > > > > What we need is a way to only use fine-grained timestamps when they are > > > being actively queried. > > > > > > The kernel always stores normalized ctime values, so only the first 30 > > > bits of the tv_nsec field are ever used. Whenever the mtime changes, the > > > ctime must also change. > > > > > > Use the 31st bit of the ctime tv_nsec field to indicate that something > > > has queried the inode for the i_mtime or i_ctime. When this flag is set, > > > on the next timestamp update, the kernel can fetch a fine-grained > > > timestamp instead of the usual coarse-grained one. > > > > > > This patch adds the infrastructure this scheme. Filesytems can opt > > > into it by setting the FS_MULTIGRAIN_TS flag in the fstype. > > > > > > Later patches will convert individual filesystems over to use it. > > > > > > Signed-off-by: Jeff Layton > > > > So there are two things I dislike about this series because I think they > > are fragile: > > > > 1) If we have a filesystem supporting multigrain ts and someone > > accidentally directly uses the value of inode->i_ctime, he can get bogus > > value (with QUERIED flag). This mistake is very easy to do. So I think we > > should rename i_ctime to something like __i_ctime and always use accessor > > function for it. > > > > We could do this, but it'll be quite invasive. We'd have to change any > place that touches i_ctime (and there are a lot of them), even on > filesystems that are not being converted. Yes, that's why I suggested Coccinelle to deal with this. > > 2) As I already commented in a previous version of the series, the scheme > > with just one flag for both ctime and mtime and flag getting cleared in > > current_time() relies on the fact that filesystems always do an equivalent > > of: > > > > inode->i_mtime = inode->i_ctime = current_time(); > > > > Otherwise we can do coarse grained update where we should have done a fine > > grained one. Filesystems often update timestamps like this but not > > universally. Grepping shows some instances where only inode->i_mtime is set > > from current_time() e.g. in autofs or bfs. Again a mistake that is rather > > easy to make and results in subtle issues. I think this would be also > > nicely solved by renaming i_ctime to __i_ctime and using a function to set > > ctime. Mtime could then be updated with inode->i_mtime = ctime_peek(). > > > > I understand this is quite some churn but a very mechanical one that could > > be just done with Coccinelle and a few manual fixups. So IMHO it is worth > > the more robust result. > > AFAICT, under POSIX, you must _always_ set the ctime when you set the > mtime, but the reverse is not true. That's why keeping the flag in the > ctime makes sense. If we're updating the mtime, then we necessarily must > update the ctime. Yes, but technically speaking: inode->i_mtime = current_time(); inode->i_ctime = current_time(); follows these requirements but is broken with your changes in unobvious way. And if mtime update is hidden in some function or condition, it is not even that suboptimal coding pattern. > > > +} > > > + > > > +/** > > > + * ctime_peek - peek at (but don't query) the ctime > > > + * @inode: inode to fetch the ctime from > > > + * > > > + * Grab the current ctime from the inode, sans I_CTIME_QUERIED flag. For > > > + * use by internal consumers that don't require a fine-grained update on > > > + * the next change. > > > + * > > > + * This is safe to call regardless of whether the underlying filesystem > > > + * is using multigrain timestamps. > > > + */ > > > +static inline struct timespec64 ctime_peek(const struct inode *inode) > > > +{ > > > + struct timespec64 ctime; > > > + > > > + ctime.tv_sec = inode->i_ctime.tv_sec; > > > + ctime.tv_nsec = ctime_nsec_peek(inode); > > > + > > > + return ctime; > > > +} > > > > Given this is in a header that gets included in a lot of places, maybe we > > should call it like inode_ctime_peek() or inode_ctime_get() to reduce > > chances of a name clash? > > I'd be fine with that, but "ctime" sort of implies inode->i_ctime to me. > We don't really use that nomenclature elsewhere. Yes, I don't insist here but we do have 'ctime' e.g. in IPC code (sem_ctime, shm_ctime, msg_ctime), we have ctime in md layer, ctime(3) is also a glibc function. So it is not *completely* impossible the clash happens but we can deal with it when it happens. Honza -- Jan Kara SUSE Labs, CR