From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 76B10C00A98 for ; Thu, 19 Oct 2023 22:02:32 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 13F7B800C4; Thu, 19 Oct 2023 18:02:32 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 0C90C800BC; Thu, 19 Oct 2023 18:02:32 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E8420800C4; Thu, 19 Oct 2023 18:02:31 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id D33DA800BC for ; Thu, 19 Oct 2023 18:02:31 -0400 (EDT) Received: from smtpin06.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id AB141813CD for ; Thu, 19 Oct 2023 22:02:31 +0000 (UTC) X-FDA: 81363585702.06.2F88C31 Received: from mail-pl1-f181.google.com (mail-pl1-f181.google.com [209.85.214.181]) by imf10.hostedemail.com (Postfix) with ESMTP id 8958EC0019 for ; Thu, 19 Oct 2023 22:02:29 +0000 (UTC) Authentication-Results: imf10.hostedemail.com; dkim=pass header.d=fromorbit-com.20230601.gappssmtp.com header.s=20230601 header.b=05aHPy3q; spf=pass (imf10.hostedemail.com: domain of david@fromorbit.com designates 209.85.214.181 as permitted sender) smtp.mailfrom=david@fromorbit.com; dmarc=pass (policy=quarantine) header.from=fromorbit.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1697752949; a=rsa-sha256; cv=none; b=2q5vJaL0UKy1BzLEeJsQLzR2o19eHvND0LQKQ0YUNxLX/ryMjKt3DxdYemYF8ZVFjlK9f+ v/6+hUiZpF4LJZLFTlUaykkzWdJaMb14+ciXUBFw5O6zITdHpGH7d7kJY/+rA3cZPKJMBe uumZL8Z1TBoxh3helVkvBLg65/VAzUc= ARC-Authentication-Results: i=1; imf10.hostedemail.com; dkim=pass header.d=fromorbit-com.20230601.gappssmtp.com header.s=20230601 header.b=05aHPy3q; spf=pass (imf10.hostedemail.com: domain of david@fromorbit.com designates 209.85.214.181 as permitted sender) smtp.mailfrom=david@fromorbit.com; dmarc=pass (policy=quarantine) header.from=fromorbit.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1697752949; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=tePTsAOLSKNkXxpi1DoTK7cPVwy/xFqcvNRJj1WQpQs=; b=zYkIuHtDLT5JiJ5Sijnro3Dvh2STsbIQFJHwyaD+VstyCp4MdDZuOWFsZV/tyQNZX5CMqv FvNhubjB9KaIK1rrJH2N0Tdj1N5UiJOET0ND+e2ZGLOsTqcIlZwRomTzBbfZeaCGGS1Vcp JX6jyLKVw/INBcyjCMFmrdqaeFHJ+/I= Received: by mail-pl1-f181.google.com with SMTP id d9443c01a7336-1c9e95aa02dso1738795ad.0 for ; Thu, 19 Oct 2023 15:02:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fromorbit-com.20230601.gappssmtp.com; s=20230601; t=1697752948; x=1698357748; darn=kvack.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=tePTsAOLSKNkXxpi1DoTK7cPVwy/xFqcvNRJj1WQpQs=; b=05aHPy3ql63JVZhzuCgD00aPXENj8eVFqYXsriSttAtM/F73I0gh0LQdhIsI34kPec TZ4L5ed5FVn8gCAXtLmeGqziqEyohIZweq9V9JdDSaWtASRDkCHFOkcAZ0A6u3bHp7z2 Rnx4gQF+r0DAhbhkxU1MtvCsjnzBwJlI53rjiL8P7F1AgVw7A3FbUlo48X/3Fynf+OHt D2Ym3GWFcY+uM/3CgvPgDKeGSoPwxIefj4aVHeRr/EWrYU9kAmZiE3qB8/BpIwMsVdBm BBHtkO6dOeFxrtZnPH6GlZuRGeQ1lzWxOsHHtPGpRdsP0pTer5+/mBCFlPYd2dHfhT4t +5jg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1697752948; x=1698357748; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=tePTsAOLSKNkXxpi1DoTK7cPVwy/xFqcvNRJj1WQpQs=; b=rPSsUJdOLvKWpYOKxcy3IgxK/L2ggKCsUhHrgMWauLNM/751jYValvWZz7RWpl/GKC nTuctR6LJaVKSxMiKQcgQTvCB1wGtuBQhxeIwOdwl8lOH0yu2wtOrPb8PMCTsvpCHIXE xSfF19flc9g57r99vDkzdsRmgqcSDW8qTTYb4nd2vuI4/oON9d6aMCulgcmGH25Yeq5F gW1tOQzxm62ufj9mqXqPJPo1JsUx7WfPvJv0CRDEJQW571iU+mLFIhWBb8VG0jWJXxRS OJ3MKHacwsEexmIB+vWjvV/ffr8PXeon6Nya5JBgeaPMXQRXCa6FAUIth1nMP7rNa9at MLmQ== X-Gm-Message-State: AOJu0YwBx2lXNwDbetk55GxOuYGplnBILa8+xcyD57M6AtgedKufll2p 1V2+2fkzV+yZ0e5f1zeJ8D8Msg== X-Google-Smtp-Source: AGHT+IGofu+MJKzCHfasjwiDGYCbXyVv9CGGJQIOinf95G43x5bHvvDHOMSsorB5TbxulBM699MT1g== X-Received: by 2002:a17:903:2447:b0:1c5:ecfc:2650 with SMTP id l7-20020a170903244700b001c5ecfc2650mr4379861pls.14.1697752948177; Thu, 19 Oct 2023 15:02:28 -0700 (PDT) Received: from dread.disaster.area (pa49-180-20-59.pa.nsw.optusnet.com.au. [49.180.20.59]) by smtp.gmail.com with ESMTPSA id ix4-20020a170902f80400b001bc5dc0cd75sm190994plb.180.2023.10.19.15.02.27 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 19 Oct 2023 15:02:27 -0700 (PDT) Received: from dave by dread.disaster.area with local (Exim 4.96) (envelope-from ) id 1qtb60-001JiA-2E; Fri, 20 Oct 2023 09:02:24 +1100 Date: Fri, 20 Oct 2023 09:02:24 +1100 From: Dave Chinner To: Jeff Layton Cc: Christian Brauner , Linus Torvalds , Alexander Viro , John Stultz , Thomas Gleixner , Stephen Boyd , Chandan Babu R , "Darrick J. Wong" , Theodore Ts'o , Andreas Dilger , Chris Mason , Josef Bacik , David Sterba , Hugh Dickins , Andrew Morton , Amir Goldstein , Jan Kara , David Howells , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-xfs@vger.kernel.org, linux-ext4@vger.kernel.org, linux-btrfs@vger.kernel.org, linux-mm@kvack.org, linux-nfs@vger.kernel.org Subject: Re: [PATCH RFC 2/9] timekeeping: new interfaces for multigrain timestamp handing Message-ID: References: <20231018-mgtime-v1-0-4a7a97b1f482@kernel.org> <20231018-mgtime-v1-2-4a7a97b1f482@kernel.org> <5f96e69d438ab96099bb67d16b77583c99911caa.camel@kernel.org> <20231019-fluor-skifahren-ec74ceb6c63e@brauner> <0a1a847af4372e62000b259e992850527f587205.camel@kernel.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <0a1a847af4372e62000b259e992850527f587205.camel@kernel.org> X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: 8958EC0019 X-Stat-Signature: bxnhtiuzhsqg3e66t3kjgzezgd3jowsw X-Rspam-User: X-HE-Tag: 1697752949-414463 X-HE-Meta: U2FsdGVkX1/NQtmcdokmpIVMbqrFcClZP0yeK7hUs3VMKuSWL3NbVNoOTNERFl7Etsgmr3SakPCdS7U0S9qoa7Z1U2zOiMzvGLb/cPBUYCoyOpj9NSC/c/b0/xmGBcbiYvXYBcfq5LFHCVxnB/uzElBV1tPF2XOXE8GsFV7LYOIdwledNp6WfXailnrjgfURW/ZIW+9DJkpcxEz4d5pYChZBT3NDFb/aLRVt3AO/k0VlCdxzRH/Di/N1n/WmTexbOpm8/zzp8PlM/VgOCLWFgnKtTlTKxWUQcWB0NUIccN9FEB1/EvQTG+PC/qz6OmjYvqW/YTMANLvLEQgFnCNCd2vmpxbcAHpa3shTTTEsNMU7ly3b7xlv4GRVgvDFExQ7dEDZcgP1p912G8MUQDQvyNSMa5ov/FIZ3un4LxRfaXrqyqlsxPK6H3KWakydyi2Jw7tZeDbyjQ7xz8ZjL6+4J6sJNh9onmLd0aygroMhHm6P+4PWfh1pNIR24dLy1m49RLikzisHPMYbf2WJpWpQUgRdDX2xrMwTA91yh+t1gSu5Zf7eplxhBhT0ySlqwvH8unTKRARt9FgLCMhsKn0KDUoRuOCfc/wrCaqEOw8EQ3KmVidVYs1nQWvQU46CLZfW4cfJ2MOkmYWfjy9jSnFUL9pdI4l8kxy5qlmoYcUrXQT5gHiUQ11QlimOWDak6Yms/HUWU771A6gCr62p9VdmPzxNSu+9pcF+daTkjyTfz5RuroeJ/KtuhjdCvoXsUPiFsCv5O7pXV+6pPjgyhTvGVGdxxadgyBCX9HGj3haIyvcbvPTsS32SUfZyRLarK+4jShw1Dd7Hg2HKZMyxCNLCP6hLPTw3/wX3rWhSxPLnnrGQTMkiyxwNcB3hf/H8lO3AI+17IydgKNz92/cgyIKGS7B+oj/iJL+kkVm+DDxmEu8sUxJdXkEuj2fLpmZ0VJNJJIBpxpFuDdbG3rb9J02 FZRuybm+ Wzthu8zzpepnmfVsG+FK1y2NmiZucEFBy5NdG0NmH4g0U2cxutI39J1p6fyXvcr4iohXnJCPqKo5s7LuUu+KQUCElAhngmxMA36sP9wKoYqvlSYD8VhMv2ntXdigHh7lJjGPLs2T0vXGN7tTja2+a2cKZNPuM36o+x65JnFl1vf0dYlGKMCQUIkW0ZYsA4FNUzb6+EGOj0wGRGTN7E9WSZpbi1B9Yh+8eXInlioUsQKb1vRAFNOUJbbw61gT3UFoEvdEOzhyhQv2BxV9ZrKBdNNPnX8QiC+ylN0gSSEc64L2hbWzNepbKTD74ruZFJFwf96FXrUaUuE7g6RT0/BLnc/QFq/3EktaZoOrrSG67rxtgAM72fBndPX5v4tDqq5XHV7++Gvp1x/3TX6g5bJ8J5EQjZI8Dp8STozx5wwWkSERD8n+UZbnT5hGT4Ba8XvyNfJ8+ X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, Oct 19, 2023 at 07:28:48AM -0400, Jeff Layton wrote: > On Thu, 2023-10-19 at 11:29 +0200, Christian Brauner wrote: > > > Back to your earlier point though: > > > > > > Is a global offset really a non-starter? I can see about doing something > > > per-superblock, but ktime_get_mg_coarse_ts64 should be roughly as cheap > > > as ktime_get_coarse_ts64. I don't see the downside there for the non- > > > multigrain filesystems to call that. > > > > I have to say that this doesn't excite me. This whole thing feels a bit > > hackish. I think that a change version is the way more sane way to go. > > > > What is it about this set that feels so much more hackish to you? Most > of this set is pretty similar to what we had to revert. Is it just the > timekeeper changes? Why do you feel those are a problem? > > > > > > > On another note: maybe I need to put this behind a Kconfig option > > > initially too? > > > > So can we for a second consider not introducing fine-grained timestamps > > at all. We let NFSv3 live with the cache problem it's been living with > > forever. > > > > And for NFSv4 we actually do introduce a proper i_version for all > > filesystems that matter to it. > > > > What filesystems exactly don't expose a proper i_version and what does > > prevent them from adding one or fixing it? > > Certainly we can drop this series altogether if that's the consensus. > > The main exportable filesystem that doesn't have a suitable change > counter now is XFS. Fixing it will require an on-disk format change to > accommodate a new version counter that doesn't increment on atime > updates. This is something the XFS folks were specifically looking to > avoid, but maybe that's the simpler option. And now we have travelled the full circle. The problem NFS has with atime updates on XFS is a result of the default behaviour of relatime - it *always* forces a persistent atime update after mtime has changed. Hence a read-after-write operation will trigger an atime update because atime is older than mtime. This is what causes XFS to run a transaction (i.e. a persistent atime update) and that bumps iversion. lazytime does not behave this way - it delays all persistent timestamp updates until the next persistent change or until the lazytime aggregation period expires (24 hours). Hence with lazytime, read-after-write operations do not trigger a persistent atime update, and so XFS does not run a transaction to update atime. Hence i_version does not get bumped, and NFS behaves as expected. IOWs, what the NFS server actually wants from the filesytsems is for lazy timestamp updates to always be used on read operations. It does not want persistent timestamp updates that change on-disk state. The recent "redefinition" of when i_version should change effectively encodes this - i_version should only change when a persistent metadata or data change is made that also changes [cm]time. Hence the simple, in-memory solution to this problem is for NFS to tell the filesysetms that it needs to using lazy (in-memory) atime updates for the given operation rather than persistent atime updates. We already need to modify how atime updates work for io_uring - io_uring needs atime updates to be guaranteed non-blocking similar to updating mtime in the write IO path. If a persistent timestamp change needs to be run, then the timestamp update needs to return -EAGAIN rather than (potentially) blocking so the entire operation can be punted to a context that can block. This requires control flags to be passed to the core atime handling functions. If a filesystem doesn't understand/support the flags, it can just ignore it and do the update however it was going to do it. It won't make anything work incorrectly, just might do something that is not ideal. With this new "non-blocking update only" flag for io_uring and a new "non-persistent update only" flag for NFS, we have a very similar conditional atime update requirements from two completely independent in-kernel applications. IOWs, this can be solved quite simply by having the -application- define the persistence semantics of the operation being performed. Add a RWF_LAZYTIME/IOCB_LAZYTIME flag for read IO that is being issued from the nfs daemon (i.e. passed to vfs_iter_read()) and then the vfs/filesystem can do exactly the right thing for the IO being issued. This is what io_uring does with IOCB_NOWAIT to tell the filesystems that the IO must be non-blocking, and it's the key we already use for non-blocking mtime updates and will use to trigger non-blocking atime updates.... I also know of cases where a per-IO RWF_LAZYTIME flag would be beneficial - large databases are already using lazytime mount options so that their data IO doesn't take persistent mtime update overhead hits on every write IO..... > There is also bcachefs which I don't think has a change attr yet. They'd > also likely need a on-disk format change, but hopefully that's a easier > thing to do there since it's a brand new filesystem. It's not a "brand new filesystem". It's been out there for quite a long while, and it has many users that would be impacted by on-disk format changes at this point in it's life. on-disk format changes are a fairly major deal for filesystems, and if there is any way we can avoid them we should. -Dave. -- Dave Chinner david@fromorbit.com