From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 48246C4167B for ; Tue, 31 Oct 2023 01:42:48 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id BD1FB8D000C; Mon, 30 Oct 2023 21:42:47 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id B821C8D0008; Mon, 30 Oct 2023 21:42:47 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A4A1A8D000C; Mon, 30 Oct 2023 21:42:47 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 949468D0008 for ; Mon, 30 Oct 2023 21:42:47 -0400 (EDT) Received: from smtpin28.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 5B1F3120A44 for ; Tue, 31 Oct 2023 01:42:47 +0000 (UTC) X-FDA: 81404057574.28.36A6A56 Received: from mail-pf1-f170.google.com (mail-pf1-f170.google.com [209.85.210.170]) by imf11.hostedemail.com (Postfix) with ESMTP id 5AAB440002 for ; Tue, 31 Oct 2023 01:42:45 +0000 (UTC) Authentication-Results: imf11.hostedemail.com; dkim=pass header.d=fromorbit-com.20230601.gappssmtp.com header.s=20230601 header.b=fLCkroei; spf=pass (imf11.hostedemail.com: domain of david@fromorbit.com designates 209.85.210.170 as permitted sender) smtp.mailfrom=david@fromorbit.com; dmarc=pass (policy=quarantine) header.from=fromorbit.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1698716565; a=rsa-sha256; cv=none; b=pSq4Ggof6S6vXfyZmombEeg6Oh3lEUGh4Qt6MU6wQMJ1P2d31MA5g2IAMMIyGbeoYqIijk PS4jGh3tzMOte88eSlw4HbsJGv1TSmilnf7aSHUf10SOKCwTxg4J5XZHXEa6jL1dbtwTZj KOnnQSR9dEyt9VyNsSJJF98rd2Qqz6U= ARC-Authentication-Results: i=1; imf11.hostedemail.com; dkim=pass header.d=fromorbit-com.20230601.gappssmtp.com header.s=20230601 header.b=fLCkroei; spf=pass (imf11.hostedemail.com: domain of david@fromorbit.com designates 209.85.210.170 as permitted sender) smtp.mailfrom=david@fromorbit.com; dmarc=pass (policy=quarantine) header.from=fromorbit.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1698716565; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=gq2lGFwF4bxRWBxtwu5w9sBzhD8p8IyhBLW9hVZaXvI=; b=XqkLgWTfK2DhuSMVdm+iK4Iu704b1LDmgpXI2KHCVOOFtrk0QwbAFyQ0eN3xW/O1DkORCb PM8RcfK04VfxEgmePwQk52mupVVuirLsR3UJGJee0fPF9Y+rS3IkrlfQzEnnziHBqxHIWp gtZM/4jIoaM5vo+fgrAGhRP0ShepthQ= Received: by mail-pf1-f170.google.com with SMTP id d2e1a72fcca58-6b5af4662b7so4559439b3a.3 for ; Mon, 30 Oct 2023 18:42:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fromorbit-com.20230601.gappssmtp.com; s=20230601; t=1698716564; x=1699321364; darn=kvack.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=gq2lGFwF4bxRWBxtwu5w9sBzhD8p8IyhBLW9hVZaXvI=; b=fLCkroeiV3K+ugtUVPE66kKYH4GjHb+mLpGU2Pm/wHD1sbXQ5dqcaNDWoCWOQlYHSd 713MhtcCp/ywiqHS/MtyY3csqGTRwg0K5cW+eC+SWhfJmJfjN+R+AA3iGvoUinc9VC/D k3vMNVja4iVXbHXggMdEpqrXqZ2T89Bsq8ZxhC+PZnScelbOiarHtOGS7kObJCGke+dr 1GhGtlBG7H0eozrH/7EZMf1+tq1iU2kPZsGPnUIaVkcNkFcG306kLUpre6Ge66e5UT4Y Me5mI5xp3tTCMCuTul8L7PhMKYNkHj2+2zteYZ1N3YqP9AuyJTL7cPo4zuGAna4QIHRe q1Ag== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1698716564; x=1699321364; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=gq2lGFwF4bxRWBxtwu5w9sBzhD8p8IyhBLW9hVZaXvI=; b=dx4rHsf8MRSiLgmSdUx4fNDZo+6EHcjGsXp/Fz2amzpkdez2cAhkwtKQklhDv5qu4U dSv7CP3ADpfWcQmMSjfVzRRpk8vZdY3XxsZ1XzsdNMk5mW6bCEABXgvoXy1bKbMkJsrU hcvr2N4Zhb/ACW53SmYBduL0QuQjFosohmJzH7MxI/984b+VBn+qSXh+JvXv/bTWgZLN 38hWNcJzTkVk591doDCKQVZ9BqOPDhNpu9Lg4eJjJ7D/kGnz9p18/eDmbLiPLrkPy2bI 9L8T6owD/E/hpuY8m5ZEQ3RZsyphQjnZQG3dG8M5CQ5qGQ9qrZCwB2aGBA7/TQt8TDMD yCoQ== X-Gm-Message-State: AOJu0YwZRdaTwQMNZLmrSPddbvTz8feMe+rAldqkZ3n0fhBgt/xaCu+Y bEpSdGO1JKnYzBVnJAd6gEZTUA== X-Google-Smtp-Source: AGHT+IGdn0+gqtPwjsc6S1RJPgTznnlDmnMp6k47wccNyrWTBrFDio/A8SFbOI5f2QToLrcb1QPM3A== X-Received: by 2002:a05:6a21:778a:b0:16b:7602:1837 with SMTP id bd10-20020a056a21778a00b0016b76021837mr10790249pzc.29.1698716564006; Mon, 30 Oct 2023 18:42:44 -0700 (PDT) Received: from dread.disaster.area (pa49-180-20-59.pa.nsw.optusnet.com.au. [49.180.20.59]) by smtp.gmail.com with ESMTPSA id y19-20020aa78553000000b00686b649cdd0sm142984pfn.86.2023.10.30.18.42.42 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 30 Oct 2023 18:42:43 -0700 (PDT) Received: from dave by dread.disaster.area with local (Exim 4.96) (envelope-from ) id 1qxdmB-0069xx-2l; Tue, 31 Oct 2023 12:42:39 +1100 Date: Tue, 31 Oct 2023 12:42:39 +1100 From: Dave Chinner To: Linus Torvalds Cc: Jeff Layton , Amir Goldstein , Kent Overstreet , Christian Brauner , Alexander Viro , John Stultz , Thomas Gleixner , Stephen Boyd , Chandan Babu R , "Darrick J. Wong" , Theodore Ts'o , Andreas Dilger , Chris Mason , Josef Bacik , David Sterba , Hugh Dickins , Andrew Morton , Jan Kara , David Howells , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-xfs@vger.kernel.org, linux-ext4@vger.kernel.org, linux-btrfs@vger.kernel.org, linux-mm@kvack.org, linux-nfs@vger.kernel.org Subject: Re: [PATCH RFC 2/9] timekeeping: new interfaces for multigrain timestamp handing Message-ID: References: <2ef9ac6180e47bc9cc8edef20648a000367c4ed2.camel@kernel.org> <6df5ea54463526a3d898ed2bd8a005166caa9381.camel@kernel.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: 5AAB440002 X-Stat-Signature: iqe8i5mu8enne88u7fde4bje9ag6qyp1 X-Rspam-User: X-HE-Tag: 1698716565-780382 X-HE-Meta: U2FsdGVkX1+fMNyfpPxMvWcOokZvnqerZ0s55B8VEbCvzuaX19frvLfg+k56ANL9OSrtusAoJl+8xDYe0TswFI9BMfem8yEmciNFYwopz2wKuiKZOQ+xt7uqVVzNlXW1esuzxiAak+OKNpvPR1K4sa7nREDkP0ATFOpjATxV3izo+EZomjtdHU6qZFvb3YBFDFdPc99Ya7i5yg9QtoRZp6biUbdFf8WIMRiUGsV/DIReirDZenw84Dc6fhUJz0mJqJYBv+4dNm975Wx8W2UxcYzC4qtpWmpmNMk5dE8oXzyNLsPT44hvf7dwSxrPnQ8kQik9b16KuabajhcRI4T87Mnp/MVhw+Rs8nKpS9Q0MEJ7UIZ9BM/PIYiJnJBSFM+e/I5Vk0mq0vYqcFzL8gAxO24MY0p9+9FKnlmEi3/1TX8jpgoQ5HhNdRKvZJVUWfi53IPbTtbBdLYGcuXQ9xme7hA1LaWloc7JumIw8cjgdAefRw8FmpkHk2zCB1zFoSoicVQNJpivodbHZsufhMyD0JfBZhbUoX9i+5+Wy4fF7//TTHn8p/E2CpXI240IAj4qKGzr1gUZlwCMipV7MardoaU5zMs1RMB0TkK3yiNHHWkxQS9w/DWFi+riv0K4AXHZijGm8IGtYwtWE9Sv4sAj2ZYqumvwpaY/jXcVUZfmhC7cWfxUtuUqNTIfDrEDOU25J2xel+ggLxFjRJ9cLxw1cGzkzB+FLXHM2sCk2mZi4Mb/TsDT4NQkxEuULf652c4u2IQQsa31Iq3NYLy/9dB8FVLUh+EyoZUtTo0Pvz7gNIEr4fdcgrofaYk+3WES6WVXo7NJRuuHSxIVCVo+zZpwuK9At485eP5vB75mjppPmQoOyl5pBgb09/8TVvqHtOedhB3W/wULIVAznS1t1rYGjKKwX058NNErfHMAJp8Qh7+eHIuuVSFBWhQdFdlyo3whn5u87LNkmfs3u0LK6Wu OitdWBzy iHu077gVNJdp6uEggl/Ivet9IzE/wJ5vFZutwTWPlMDg4gIHxyCqrltqVKkzkoHgzA1Ilp7S4xftq4l+kZ/VQr0TA55UosWhLIZWyO0qD7hMsyOC8NxIY3WPZcZbeuBDzQcH5YP0nyuvVwAlUMlgX6a37xVsw41wW7VzLrz5BqOavNmgzMsDj8UMr4uioT3Uqg+YoNRypb8UZnuve4GiIcWgLtYrSK9/6Fbsca6fAEegX6uxjnIqlr3X4GROAqa3y8uGvHac102z7Alaanlf4hNL4uFkTC7lqYnozjL0W7ozzdD9PgurZ76JHKUPSM5zHtLoJhUPVNFK3dsiZ/lv1TL/xQ9qA3MAmLxZwj5QFwk1g+PEBYwx2Wl756tQ+iF0rqvKkvD/FGdrkqFbSt9rPXWyijkPMYdlV9+SCrpmoXXoeCWUilY5Pzd4wpsdcJvmNSCkV X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Mon, Oct 30, 2023 at 01:11:56PM -1000, Linus Torvalds wrote: > On Mon, 30 Oct 2023 at 12:37, Dave Chinner wrote: > > > > If XFS can ignore relatime or lazytime persistent updates for given > > situations, then *we don't need to make periodic on-disk updates of > > atime*. This makes the whole problem of "persistent atime update bumps > > i_version" go away because then we *aren't making persistent atime > > updates* except when some other persistent modification that bumps > > [cm]time occurs. > > Well, I think this should be split into two independent questions: > > (a) are relatime or lazytime atime updates persistent if nothing else changes? They only become persistent after 24 hours or, in the case of relatime, immediately persistent if mtime < atime (i.e. read after a modification). Those are the only times that the VFS triggers persistent writeback of atime, and it's the latter case (mtime < atime) that is the specific trigger that exposed the problem with atime bumping i_version in the first place. > (b) do atime updates _ever_ update i_version *regardless* of relatime > or lazytime? > > and honestly, I think the best answer to (b) would be that "no, > i_version should simply not change for atime updates". And I think > that answer is what it is because no user of i_version seems to want > it. As I keep repeating: Repeatedly stating that "atime should not bump i_version" does not address the questions I'm asking *at all*. > Now, the reason it's a single question for you is that apparently for > XFS, the only thing that matters is "inode was written to disk" and > that "di_changecount" value is thus related to the persistence of > atime updates, but splitting di_changecount out to be a separate thing > from i_version seems to be on the table, so I think those two things > really could be independent issues. Wrong way around - we'd have to split i_version out from di_changecount. It's i_version that has changed semantics, not di_changecount, and di_changecount behaviour must remain unchanged. What I really don't want to do is implement a new i_version field in the XFS on-disk format. What this redefinition of i_version semantics has made clear is that i_version is *user defined metadata*, not internal filesystem metadata that is defined by the filesystem on-disk format. User defined persistent metadata *belongs in xattrs*, not in the core filesystem on-disk formats. If the VFS wants to define and manage i_version behaviour, smeantics and persistence independently of the filesystems that manage the persistent storage (as it clearly does!) then we should treat it just like any other VFS defined inode metadata (e.g. per inode objects like security constraints, ACLs, fsverity digests, fscrypt keys, etc). i.e. it should be in a named xattr, not directly implemented in the filesystem on-disk format deinfitions. Then the application can change the meaning of the metadata whenever and however it likes. Then filesystem developers just don't need to care about it at all because the VFS specific persistent metadata is not part of the on-disk format we need to maintain cross-platform forwards and backwards compatibility for. > > But I don't want to do this unconditionally - for systems not > > running anything that samples i_version we want relatime/lazytime > > to behave as they are supposed to and do periodic persistent updates > > as per normal. Principle of least surprise and all that jazz. > > Well - see above: I think in a perfect world, we'd simply never change > i_version at all for any atime updates, and relatime/lazytime simply > wouldn't be an issue at all wrt i_version. Right, that's what I'd like, especially as the new definition of i_version - "only change when [cm]time changes" - means that the VFS i_version is really now just a glorified timestamp. > Wouldn't _that_ be the trule "least surprising" behavior? Considering > that nobody wants i_version to change for what are otherwise pure > reads (that's kind of the *definition* of atime, after all). So, if you don't like the idea of us ignoring relatime/lazytime conditionally, are we allowed to simply ignore them *all the time* and do all timestamp updates in the manner that causes users the least amount of pain? I mean, relatime only exists because atime updates cause users pain. lazytime only exists because relatime doesn't address the pain that timestamp updates cause mixed read/write or pure O_DSYNC overwrite workloads pain. noatime is a pain because it loses all atime updates. There is no "one size is right for everyone", so why not just let filesystems do what is most efficient from an internal IO and persistence POV whilst still maintaining the majority of "expected" behaviours? Keep in mind, though, that this is all moot if we can get rid of i_version entirely.... > Now, the annoyance here is that *both* (a) and (b) then have that > impact of "i_version no longer tracks di_changecount". .... and what is annoying is that that the new i_version just a glorified ctime change counter. What we should be fixing is ctime - integrating this change counting into ctime would allow us to make i_version go away entirely. i.e. We don't need a persistent ctime change counter if the ctime has sufficient resolution or persistent encoding that it does not need an external persistent change counter. That was reasoning behind the multi-grain timestamps. While the mgts implementation was flawed, the reasoning behind it certainly isn't. We should be trying to get rid of i_version by integrating it into ctime updates, not arguing how atime vs i_version should work. > So I don't think the issue here is "i_version" per se. I think in a > vacuum, the best option of i_version is pretty obvious. But if you > want i_version to track di_changecount, *then* you end up with that > situation where the persistence of atime matters, and i_version needs > to update whenever a (persistent) atime update happens. Yet I don't want i_version to track di_changecount. I want to *stop supporting i_version altogether* in XFS. I want i_version as filesystem internal metadata to die completely. I don't want to change the on disk format to add a new i_version field because we'll be straight back in this same siutation when the next i_version bug is found and semantics get changed yet again. Hence if we can encode the necessary change attributes into ctime, we can drop VFS i_version support altogether. Then the "atime bumps i_version" problem also goes away because then we *don't use i_version*. But if we can't get the VFS to do this with ctime, at least we have the abstractions available to us (i.e. timestamp granularity and statx change cookie) to allow XFS to implement this sort of ctime-with-integrated-change-counter internally to the filesystem and be able to drop i_version support.... [....] > This really is all *entirely* an artifact of that "bi_changecount" vs > "i_version" being tied together. You did seem to imply that you'd be > ok with having "bi_changecount" be split from i_version, ie from an > earlier email in this thread: > > "Now that NFS is using a proper abstraction (i.e. vfs_statx()) to get > the change cookie, we really don't need to expose di_changecount in > i_version at all - we could simply copy an internal di_changecount > value into the statx cookie field in xfs_vn_getattr() and there > would be almost no change of behaviour from the perspective of NFS > and IMA at all" .... which is what I was talking about here. i.e. I was not talking about splitting i_version from di_changecount - I was talking about being able to stop supporting the VFS i_version counter entirely and still having NFS and IMA work correctly. Continually bring the argument back to "atime vs i_version" misses the bigger issues around this new i_version definition and implementation, and that the real solution should be to fix ctime updates to make i_version at the VFS level go away forever. -Dave. -- Dave Chinner david@fromorbit.com