From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 84E76C41535 for ; Mon, 30 Oct 2023 23:34:33 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 1638A8D0005; Mon, 30 Oct 2023 19:34:33 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 0EC978D0002; Mon, 30 Oct 2023 19:34:33 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id ECED38D0005; Mon, 30 Oct 2023 19:34:32 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id D6E5E8D0002 for ; Mon, 30 Oct 2023 19:34:32 -0400 (EDT) Received: from smtpin02.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 9576880551 for ; Mon, 30 Oct 2023 23:34:32 +0000 (UTC) X-FDA: 81403734384.02.33CF1E8 Received: from mail-pj1-f45.google.com (mail-pj1-f45.google.com [209.85.216.45]) by imf01.hostedemail.com (Postfix) with ESMTP id A9B9E40009 for ; Mon, 30 Oct 2023 23:34:30 +0000 (UTC) Authentication-Results: imf01.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=fESONZzg; spf=pass (imf01.hostedemail.com: domain of ronniesahlberg@gmail.com designates 209.85.216.45 as permitted sender) smtp.mailfrom=ronniesahlberg@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1698708870; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=+CB/3ERKRQdAoGXS7P/nKVgYxNOJttrjEEJi0MqI70I=; b=g3BRmUzpNxI+AC8cqZCadMyjX54OcLyupze28W5Ab3UA2HHTCicHC6aKqOSCyJH7lkrEgJ dzIWquJ7siBwAl5vJ8ejhb/q0J7gT6RqwSgPvevk5wcBj+be1cvU+rLjGJhw6Rg9falUTw /Rw+SQfN9zJ7c2JSAzMXnREFK7lzt60= ARC-Authentication-Results: i=1; imf01.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=fESONZzg; spf=pass (imf01.hostedemail.com: domain of ronniesahlberg@gmail.com designates 209.85.216.45 as permitted sender) smtp.mailfrom=ronniesahlberg@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1698708870; a=rsa-sha256; cv=none; b=g7maVVeCqmur8fMYsi6pqz+TULa95GgQjiTE9cNasWOU/Kp0pcADp6/74f0qQilPInb2aD YCrkX8TnmiXgt6btzzotI9QjBDLnTznu0N4Q1rszap8WKhEteOsHDSVAM6NDfTTq5oYIsy QHDUrym1cg2f1HYaEcof5+9I1azlQm4= Received: by mail-pj1-f45.google.com with SMTP id 98e67ed59e1d1-28001f8079cso3446013a91.2 for ; Mon, 30 Oct 2023 16:34:30 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1698708869; x=1699313669; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=+CB/3ERKRQdAoGXS7P/nKVgYxNOJttrjEEJi0MqI70I=; b=fESONZzgDUxz2CWaoGSpULc5u+086pNQ1jvTao8Sm0d7Wh09IckH+G4dt9jQidiHHs zMvA4RLhKcj04yIW4s651axxUmndCnTiu8V83eXu95PhNIIhh1oLKp/J14Q8mhIGNJ6/ IYlFhnq3b1u5rWmPF713RTAEHbsns1qn1BkZ9/iBtD/7wTQBmwWCioWhbqI9gDiNJBnl kC/UncmXahYEeMstb609bYJqPa+QcGyk6qF/iZ9u6byHty+mIWZhtOdZmtuqa0tMHagr 1XaE2aqXgCU0RGRFCDzjtgqBS7UJ3x6cDTMWKdddV/RaW0B3UAUwgTLIx+xHtpFACbqa SWWA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1698708869; x=1699313669; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=+CB/3ERKRQdAoGXS7P/nKVgYxNOJttrjEEJi0MqI70I=; b=dr0INRMyqwAgRYimCbnZgoAJ/2omE+0geZmziUSL7VAbqF1QA+0FOaRNpwh0kYVaao U55/aNXjFAcXO1rHCwwAMyAleA8USz4EHejDcBhAmGYAJvLuCj/s8D7NnkCSg2vnfbWT d0vEhE/EVVu5gZeCVk/t/tDyw4VUU3NGj0f5N8NVm/g2rqxKqklV+Kk/WXNx0CCH1n7j Fng+0aKkZEAIhbJKI5iHGltzCTqJFOYsO/77H0e+kC68Wr1iiD/+ibc/OIOnwr1Wwomg 5jKNX/O8w7xNCF713LVnmabMEyHM2tho4M/M6ClcOuj5osU7jq8zPBMEvhJE1eI9i/Lb hsrg== X-Gm-Message-State: AOJu0YzhzTi9pAlr8eEQ6fNjrFRiR6EvgylPqcqny4qKJS5IgMcEO2Xn FhPDJ99GA/3LuNvwvqCrNZGJdTgwaMQMt+dfpsU= X-Google-Smtp-Source: AGHT+IEAugpy3wr2zQZYMIj1sg1O84YQKmvDQqdjxa/hgZpDKapgYd0DVREWOS9HC7ol9Dny9A/VBbe0vPnwbLGx634= X-Received: by 2002:a17:90a:b004:b0:280:4f82:68ac with SMTP id x4-20020a17090ab00400b002804f8268acmr3176366pjq.24.1698708869296; Mon, 30 Oct 2023 16:34:29 -0700 (PDT) MIME-Version: 1.0 References: <61b32a4093948ae1ae8603688793f07de764430f.camel@kernel.org> <2ef9ac6180e47bc9cc8edef20648a000367c4ed2.camel@kernel.org> <6df5ea54463526a3d898ed2bd8a005166caa9381.camel@kernel.org> In-Reply-To: <6df5ea54463526a3d898ed2bd8a005166caa9381.camel@kernel.org> From: ronnie sahlberg Date: Tue, 31 Oct 2023 09:34:17 +1000 Message-ID: Subject: Re: [PATCH RFC 2/9] timekeeping: new interfaces for multigrain timestamp handing To: Jeff Layton Cc: Dave Chinner , Amir Goldstein , Linus Torvalds , Kent Overstreet , Christian Brauner , Alexander Viro , John Stultz , Thomas Gleixner , Stephen Boyd , Chandan Babu R , "Darrick J. Wong" , "Theodore Ts'o" , Andreas Dilger , Chris Mason , Josef Bacik , David Sterba , Hugh Dickins , Andrew Morton , Jan Kara , David Howells , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-xfs@vger.kernel.org, linux-ext4@vger.kernel.org, linux-btrfs@vger.kernel.org, linux-mm@kvack.org, linux-nfs@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: A9B9E40009 X-Rspam-User: X-Stat-Signature: 6q6qzmihqgj5k9y17srobbmdrtw6ibaa X-Rspamd-Server: rspam01 X-HE-Tag: 1698708870-738275 X-HE-Meta: U2FsdGVkX1+WI5Q7cE1i6t2RbmmBDJTR/LLeH7J59bOr6LBFTKEvz5wKd9qPIBQot06bcAL1hUURU5gYT0M+QH+nhTC772OSS9MPAnGAd+qleZcA4wmthT49UFi44uFetpudhc5Q59itmZQEwtOf7xSd8m5KsViBuCkPiPRAyWVOiqKSrjLAqoBkO1QbjMhSg69sPFwIkHYQfdGxmTb1zDaC16VhY7KZpGCma8WWajrDghjcyvsJGPjVM6jfSKgd9Qc/9UEmqDS5pBw9jco699eLEkk9rqWIyu18khttckO4+GdFGHV3O25M9aNqhPhMYV197cXHaGGevQloMJUorM5flFZXFsADyWTg9m2eHednp1F/6yfpL5m1FNFv+ANrHKnYByMaS/QwSDxcwx2OD085nUsKyOm6Mll8FmokZECeYl9qsPqhWGTGXPGOCX9lXW/1nPtPy6qhCreE02HbDyu3OEtZ10FslVy1kAa6hK3HIx+N2VDMD/TLccVUJHx/8I7sdT4b7R2OCdCVzw/fM/FNDb4QgoX7H4zYjNDfukz9O7Dt1TTOArtR/rd6VSj2lFg/kVqW6Dc66isRLYaxOZeWHXmoPJwSRWrf1E0z/YURXzXGXXSy6OgqUgERjuAsyRiH9bxBhx8/mxoB887XN2e2OPV3+y0O7Q8jvf+sjkon44KbGmYLQvR9LsBKaIMLNFy+0443GbglKRh8gksMI6OJvIIH9u9aHRTmEO79jW952ssqwiZTe7UJyIYF0BDkb391VZ08+f3OUZmSra2BkROFbQvDc4CwLmj3SWAdH67OzrGh8AF0XeAroohQ4P5vMd/7YZcKvsWTMloIaOLIGqOlcjWj6sOmXnfrOYZ9eQ/iyK7snJuAFqPR0+fhy7iCsQCYcj16uyaJsag9ef2C3dTPmFTE6CT9Ucu2Awy5k0T9vkTNNXdoR7y1LRoIyFVc/J5H0oFGHpYVeBnD/UJ 4n1KaMPv 2IgnN+kh3lFUzVXKZfo0TuwtD7IsAproJkrJew6tIqywnaOcrQX3czrY9MEdEDJfUubl1arJ8H/v2eWcySCOX6K7dB+YR7MkF5XEvpHCKHLsLuntTjx7p/K3ZEOQELkKUzTJTnsUkTQ1yONaaLb2vyszge0OFCUm4/0qE8gcdByPNnyoxZPghL37A9KyNTVJ+NBKlArH/Qy0RBp9oYu2ZciFw/rmYnwHGHjp7sWkw/H+3Yw7T4t7jQLEM4vhG/1iQw09PrA/xWj3vRv2CxFlxwsH2yauvbkQgUK2KxZWWZQQqLayXH2eR/uGQQGdUzNpEiifQy+4386uJHi50bSKa7HViC6KH53kSzaKg5ns/kuM2O7KJiXoa9FRNxYp5dYNk6R8TKiA0Zt/idL/HNYgQBGnVUZgzr//ABnvpOAa9PqNBzjIT3yhvfBXW77vmkQqA4HxwsXwG9Jt7PU4NezMXS+7STg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Fri, 27 Oct 2023 at 20:36, Jeff Layton wrote: > > On Thu, 2023-10-26 at 13:20 +1100, Dave Chinner wrote: > > On Wed, Oct 25, 2023 at 08:25:35AM -0400, Jeff Layton wrote: > > > On Wed, 2023-10-25 at 19:05 +1100, Dave Chinner wrote: > > > > On Tue, Oct 24, 2023 at 02:40:06PM -0400, Jeff Layton wrote: > > > > > On Tue, 2023-10-24 at 10:08 +0300, Amir Goldstein wrote: > > > > > > On Tue, Oct 24, 2023 at 6:40=E2=80=AFAM Dave Chinner wrote: > > > > > > > > > > > > > > On Mon, Oct 23, 2023 at 02:18:12PM -1000, Linus Torvalds wrot= e: > > > > > > > > On Mon, 23 Oct 2023 at 13:26, Dave Chinner wrote: > > > > > > Does xfs_repair guarantee that changes of atime, or any inode c= hanges > > > > > > for that matter, update i_version? No, it does not. > > > > > > So IMO, "atime does not update i_version" is not an "on-disk fo= rmat change", > > > > > > it is a runtime behavior change, just like lazytime is. > > > > > > > > > > This would certainly be my preference. I don't want to break any > > > > > existing users though. > > > > > > > > That's why I'm trying to get some kind of consensus on what > > > > rules and/or atime configurations people are happy for me to break > > > > to make it look to users like there's a viable working change > > > > attribute being supplied by XFS without needing to change the on > > > > disk format. > > > > > > > > > > I agree that the only bone of contention is whether to count atime > > > updates against the change attribute. I think we have consensus that = all > > > in-kernel users do _not_ want atime updates counted against the chang= e > > > attribute. The only real question is these "legacy" users of > > > di_changecount. > > > > Please stop refering to "legacy users" of di_changecount. Whether > > there are users or not is irrelevant - it is defined by the current > > on-disk format specification, and as such there may be applications > > we do not know about making use of the current behaviour. > > > > It's like a linux syscall - we can't remove them because there may > > be some user we don't know about still using that old syscall. We > > simply don't make changes that can potentially break user > > applications like that. > > > > The on disk format is the same - there is software out that we don't > > know about that expects a certain behaviour based on the > > specification. We don't break the on disk format by making silent > > behavioural changes - we require a feature flag to indicate > > behaviour has changed so that applications can take appropriate > > actions with stuff they don't understand. > > > > The example for this is the BIGTIME timestamp format change. The on > > disk inode structure is physically unchanged, but the contents of > > the timestamp fields are encoded very differently. Sure, the older > > kernels can read the timestamp data without any sort of problem > > occurring, except for the fact the timestamps now appear to be > > completely corrupted. > > > > Changing the meaning of ithe contents of di_changecount is no > > different. It might look OK and nothing crashes, but nothing can be > > inferred from the value in the field because we don't know how it > > has been modified. > > > > Hence we can't just change the meaning, encoding or behaviour of an > > on disk field that would result in existing kernels and applications > > doing the wrong thing with that field (either read or write) without > > adding a feature flag to indicate what behaviour that field should > > have. > > > > > > > Perhaps this ought to be a mkfs option? Existing XFS filesystems = could > > > > > still behave with the legacy behavior, but we could make mkfs.xfs= build > > > > > filesystems by default that work like NFS requires. > > > > > > > > If we require mkfs to set a flag to change behaviour, then we're > > > > talking about making an explicit on-disk format change to select th= e > > > > optional behaviour. That's precisely what I want to avoid. > > > > > > > > > > Right. The on-disk di_changecount would have a (subtly) different > > > meaning at that point. > > > > > > It's not a change that requires drastic retooling though. If we were = to > > > do this, we wouldn't need to grow the on-disk inode. Booting to an ol= der > > > kernel would cause the behavior to revert. That's sub-optimal, but no= t > > > fatal. > > > > See above: redefining the contents, behaviour or encoding of an on > > disk field is a change of the on-disk format specification. > > > > The rules for on disk format changes that we work to were set in > > place long before I started working on XFS. They are sane, well > > thought out rules that have stood the test of time and massive new > > feature introductions (CRCs, reflink, rmap, etc). And they only work > > because we don't allow anyone to bend them for convenience, short > > cuts or expediting their pet project. > > > > > What I don't quite understand is how these tools are accessing > > > di_changecount? > > > > As I keep saying: this is largely irrelevant to the problem at hand. > > > > > XFS only accesses the di_changecount to propagate the value to and fr= om > > > the i_version, > > > > Yes. XFS has a strong separation between on-disk structures and > > in-memory values, and i_version is simply the in-memory field we use > > to store the current di_changecount value. We force bump i_version > > every time we modify the inode core regardless of whether anyone has > > queried i_version because that's what di_changecount requires. i.e. > > the filesystem controls the contents of i_version, not the VFS. > > > > Now that NFS is using a proper abstraction (i.e. vfs_statx()) to get > > the change cookie, we really don't need to expose di_changecount in > > i_version at all - we could simply copy an internal di_changecount > > value into the statx cookie field in xfs_vn_getattr() and there > > would be almost no change of behaviour from the perspective of NFS > > and IMA at all. > > > > > and there is nothing besides NFSD and IMA that queries > > > the i_version value in-kernel. So, this must be done via some sort of > > > userland tool that is directly accessing the block device (or some 3r= d > > > party kernel module). > > > > Yup, both of those sort of applications exist. e.g. the DMAPI kernel > > module allows direct access to inode metadata through a custom > > bulkstat formatter implementation - it returns different information > > comapred to the standard XFS one in the upstream kernel. > > > > > In earlier discussions you alluded to some repair and/or analysis too= ls > > > that depended on this counter. > > > > Yes, and one of those "tools" is *me*. > > > > I frequently look at the di_changecount when doing forensic and/or > > failure analysis on filesystem corpses. SOE analysis, relative > > modification activity, etc all give insight into what happened to > > the filesystem to get it into the state it is currently in, and > > di_changecount provides information no other metadata in the inode > > contains. > > > > > I took a quick look in xfsprogs, but I > > > didn't see anything there. Is there a library or something that these > > > tools use to get at this value? > > > > xfs_db is the tool I use for this, such as: > > > > $ sudo xfs_db -c "sb 0" -c "a rootino" -c "p v3.change_count" /dev/mapp= er/fast > > v3.change_count =3D 35 > > $ > > > > The root inode in this filesystem has a change count of 35. The root > > inode has 32 dirents in it, which means that no entries have ever > > been removed or renamed. This sort of insight into the past history > > of inode metadata is largely impossible to get any other way, and > > it's been the difference between understanding failure and having no > > clue more than once. > > > > Most block device parsing applications simply write their own > > decoder that walks the on-disk format. That's pretty trivial to do, > > developers can get all the information needed to do this from the > > on-disk format specification documentation we keep on kernel.org... > > > > Fair enough. I'm not here to tell you that you guys that you need to > change how di_changecount works. If it's too valuable to keep it > counting atime-only updates, then so be it. > > If that's the case however, and given that the multigrain timestamp work > is effectively dead, then I don't see an alternative to growing the on- > disk inode. Do you? Would a new mount option be a viable alternative? A new option that would when used change the semantics of these fields to what NFS needs? With the caveat: using this mount option may break other special tools that depend on the default semantics. > -- > Jeff Layton