From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9FC66C0032E for ; Wed, 25 Oct 2023 10:42:01 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 1D1188D0007; Wed, 25 Oct 2023 06:42:01 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 1814C8D0001; Wed, 25 Oct 2023 06:42:01 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 049688D0007; Wed, 25 Oct 2023 06:42:00 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id E90518D0001 for ; Wed, 25 Oct 2023 06:42:00 -0400 (EDT) Received: from smtpin01.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id C09FBB5F73 for ; Wed, 25 Oct 2023 10:42:00 +0000 (UTC) X-FDA: 81383643600.01.7D490F1 Received: from mail-qk1-f174.google.com (mail-qk1-f174.google.com [209.85.222.174]) by imf06.hostedemail.com (Postfix) with ESMTP id E9487180022 for ; Wed, 25 Oct 2023 10:41:58 +0000 (UTC) Authentication-Results: imf06.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=d23caNjC; spf=pass (imf06.hostedemail.com: domain of amir73il@gmail.com designates 209.85.222.174 as permitted sender) smtp.mailfrom=amir73il@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1698230519; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=waqA3Wuu0WYamjZmXtBbHzp1YM3pbNgPBrZtQ2JTrdM=; b=y17hFl1HuEGz1frpLNWYiTTQg/AB66a1CoNYD09EZL5IjG7VA5iv/5jrbX+M5JBe4/iKlL dlqjctE9hXzZcSe0E1UFUiw5tME3aQuBfD6A2e8kLnvwJFHa4k5HnTlYaYVgpNMOIXrm+D HXltZjipwbDlGuEkTIO743snu9+bWuA= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1698230519; a=rsa-sha256; cv=none; b=EtQgjm9rdYYehgHCTIVNsdfqxkr7wgbt9hyK4hCoXu02wZ3Z62Nbxpkew6keaQqSocf1bc 2jTwh8JSF2ScFP75M+RbKeuOiYVZ6y9sy8AA+llDEBedI0FD+3iSpMsURr+4eUYc8qhkoz sASh+fo/JRFD6O7HljICOpOKKnwo1L0= ARC-Authentication-Results: i=1; imf06.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=d23caNjC; spf=pass (imf06.hostedemail.com: domain of amir73il@gmail.com designates 209.85.222.174 as permitted sender) smtp.mailfrom=amir73il@gmail.com; dmarc=pass (policy=none) header.from=gmail.com Received: by mail-qk1-f174.google.com with SMTP id af79cd13be357-778a47bc09aso371438385a.3 for ; Wed, 25 Oct 2023 03:41:58 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1698230518; x=1698835318; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=waqA3Wuu0WYamjZmXtBbHzp1YM3pbNgPBrZtQ2JTrdM=; b=d23caNjCvo41uKgByF9NTEMdJ5E/ovVbKLesvXwQdsHFsbp50GPjA+n96CTyLY3REu HLHr8Kn5Gtx1UlQCEUlqr02WXdhrPgCUGzBDlkS0axUxPEBLb5ngorB8YrP/xLMim0eJ FPMfCSqpjCNvhCnj6HWTG/F52ToDvIg2gN9WVSdL9uhIJf9/hzBPscKK9OCr6/mq7LRy q8AYAoS3o1vgXXzzgmxWtbrRyKusMNLmJ8CrfrKzZ59hoJD5WhrbR5OHNS5ZCMvka3W3 ngfTm24YAiMaN7pP8KZudejKjWpB00H/EyRRRfsc+GO16lYlu9S3SPDUgcUnNqhhQoM+ THRA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1698230518; x=1698835318; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=waqA3Wuu0WYamjZmXtBbHzp1YM3pbNgPBrZtQ2JTrdM=; b=di+yQwIovhtc6pwxc13AXwc14btZ0r+uxBtP6AY167X6ZsUEt+wuMPnzLBLZRT3JSt RkcGs45oUL0liarXmPL35K+Tsr/QacdKdohxGKKcYqFejdpwJUHsjO3lRTZ3a7cN+cZE 1wXp24qP9pauI6W+6bjeI++TaeUiWHLvJv8sDO0GFyr86Wgy0qIN4F5JPbkASIjc+b4t bQwao8zolQU1CJvZtJl77DrNB9NzJmh0Ft6k34mjm0ggXg5XbaZs2c5FlypMuk6xaGkk rkHJcnMtwqGIzwEiVn7wcDiOlIkVIwVYzT771ql5aMhvGoTJl5iM9kdOt2/qs6XWL/fB szjw== X-Gm-Message-State: AOJu0Yy+RhdwdHOXSkjBkV2tyxj044aMCUgieRCocUBnuBNm8xbULY46 K7no0C2Bl7M+/lT9AJI3VDZtLu+hwuZ2OPb+yMU= X-Google-Smtp-Source: AGHT+IE1T4f/B4rMNL1sPJuZ3K0wNsQZDTWnBqxx6Z/H8yseAqJoiFBDHVNFekFm4dV4XwCG1j1ncqG33Ax+9qoCLa8= X-Received: by 2002:a05:620a:10a6:b0:76f:1a6b:571 with SMTP id h6-20020a05620a10a600b0076f1a6b0571mr14824898qkk.27.1698230517883; Wed, 25 Oct 2023 03:41:57 -0700 (PDT) MIME-Version: 1.0 References: <0a1a847af4372e62000b259e992850527f587205.camel@kernel.org> <61b32a4093948ae1ae8603688793f07de764430f.camel@kernel.org> In-Reply-To: From: Amir Goldstein Date: Wed, 25 Oct 2023 13:41:46 +0300 Message-ID: Subject: Re: [PATCH RFC 2/9] timekeeping: new interfaces for multigrain timestamp handing To: Dave Chinner Cc: Jeff Layton , Linus Torvalds , Kent Overstreet , Christian Brauner , Alexander Viro , John Stultz , Thomas Gleixner , Stephen Boyd , Chandan Babu R , "Darrick J. Wong" , "Theodore Ts'o" , Andreas Dilger , Chris Mason , Josef Bacik , David Sterba , Hugh Dickins , Andrew Morton , Jan Kara , David Howells , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-xfs@vger.kernel.org, linux-ext4@vger.kernel.org, linux-btrfs@vger.kernel.org, linux-mm@kvack.org, linux-nfs@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: E9487180022 X-Rspam-User: X-Rspamd-Server: rspam11 X-Stat-Signature: 79uxnpcrte31dwb6oyzw376dhcqeqjmf X-HE-Tag: 1698230518-422316 X-HE-Meta: U2FsdGVkX19ukLll6ArM1Czqtm2iFkyLHXV5soTN3kIG/BpO7BK6knJQQpRJhC2SZ3BHrcoroCBoqqlKFEUPCH/DQiGAOEuYSheTP4ySlQjAxJRmNf2JLe2WOjbckPul7j55G/MxMFERi0GtRh80AkGo5DyhSFoepXjT/s4tSkpEFfaQABiUIGe8FIEmHoc47ATQdh6p69EriLTSjyhEy8V4blpHmLdjWDvWbIc8yq67fndz3aoVCef5v0GQv8mcYkTLrRu7axkYosY2ffvNhLmuibt5NVESfm+OLeYLagjaoBCsNML5i8an+tf4BTqlloLB+bkuUiq5b+uwThMr99/wZZW/Trcx/gLY2z0/gW9DLsCEDO6p6RZgWxx4xwKdySV0CCWXV6iostvm+X76cKfVM3GR1IVTLKW2U6klQ3Lkh3C9F+GdHa1/9b0p3sK0LS3GUw0dYvfYMaZDtBcD9guv97wfQKjusNilHNdWP83KSK82w+0QwDS3p05cg2PCNBaRB3O7I0ccJ5VGCzcqAt85sRINrbLm43kyivG7k92cIxO+WvgryOcitCXyjg7RcqvKyQnZKIwbRiwwvEBvz8zpxzrWegPhopZ9mHEPTGDbeRdUo7MpmxHrmq1mNCwLOl9F8A8h2Ju9N1vuExGh1Nji0EMEa6Fdn6KpgwLH9lryDGmpBeQxbqkDmDsl7t81mV2BAT2vprsgNmVsvT/LL651OswQwRmTs4KBfSNngOy/Y7SYR5Gxr/sQ5B8Cp5+ZIeu7263QGwlhLCSSsrWKDocULu6SO/GZteAB5jxSEVXwM29NLI56X+ENvREZSY2vquHRrnVgz+fPD+sbUjgrS3e/G0FEc+pijrYtTqkbaKpA3sIJK0Q7RTbbDdcv4yxrdoiCAzWMlQb/wXoF4+nQVwzKYm1EwQn4Ku/gJJooJtQ2zfDPh07y/mabtp1GAK0Wl6HP2hyLs+2LUDe3W9z SGOZo1+x heUrxT76P3XJI3AT11IUqT0UUUtkcvCOSMhkMizIVK9SIxINJ1TTY2/oUNhIyCRD5m6fZWgVfMlllLzxfAadq18ANHvVarExGcGSGHXnmmbIhjErgMDYhaVsWs8LzS1alENcEiH7V8Cy7Zsom012Lp+kZh1gl+YGVI2hoEoU+4NCtJEjIRG6IKCYvRBZJNqipGhf9rW5HKJkf6vgdrIlrsbjdLt/Kx2EZbLhMK+cAydIdtsrlf9l/fpaKC4J/Esso9f02caOzHzDibpjskWM24J8DjKpFCjCf0hlsWAdZghWPdK9mODOJStvbDtklpNMhobE9cyXY721CJaN5C8hT5s/M7syATTj148pQvECevUAp/SFL2BwXmVK8RjJIuhd+EZNCIyl68vS3LChszgqtGcgIfgQbsJj1sbEJoDkd1GwAAUHm1YwtHuyo9nm+tBJKJlQf+RSItoiZcwQ1QWAaXHLknGiwP5k/t+BG X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, Oct 25, 2023 at 11:05=E2=80=AFAM Dave Chinner = wrote: > > On Tue, Oct 24, 2023 at 02:40:06PM -0400, Jeff Layton wrote: > > On Tue, 2023-10-24 at 10:08 +0300, Amir Goldstein wrote: > > > On Tue, Oct 24, 2023 at 6:40=E2=80=AFAM Dave Chinner wrote: > > > > > > > > On Mon, Oct 23, 2023 at 02:18:12PM -1000, Linus Torvalds wrote: > > > > > On Mon, 23 Oct 2023 at 13:26, Dave Chinner = wrote: > > > > > > > > > > > > The problem is the first read request after a modification has = been > > > > > > made. That is causing relatime to see mtime > atime and trigger= ing > > > > > > an atime update. XFS sees this, does an atime update, and in > > > > > > committing that persistent inode metadata update, it calls > > > > > > inode_maybe_inc_iversion(force =3D false) to check if an iversi= on > > > > > > update is necessary. The VFS sees I_VERSION_QUERIED, and so it = bumps > > > > > > i_version and tells XFS to persist it. > > > > > > > > > > Could we perhaps just have a mode where we don't increment i_vers= ion > > > > > for just atime updates? > > > > > > > > > > Maybe we don't even need a mode, and could just decide that atime > > > > > updates aren't i_version updates at all? > > > > > > > > We do that already - in memory atime updates don't bump i_version a= t > > > > all. The issue is the rare persistent atime update requests that > > > > still happen - they are the ones that trigger an i_version bump on > > > > XFS, and one of the relatime heuristics tickle this specific issue. > > > > > > > > If we push the problematic persistent atime updates to be in-memory > > > > updates only, then the whole problem with i_version goes away.... > > > > > > > > > Yes, yes, it's obviously technically a "inode modification", but = does > > > > > anybody actually *want* atime updates with no actual other change= s to > > > > > be version events? > > > > > > > > Well, yes, there was. That's why we defined i_version in the on dis= k > > > > format this way well over a decade ago. It was part of some deep > > > > dark magical HSM beans that allowed the application to combine > > > > multiple scans for different inode metadata changes into a single > > > > pass. atime changes was one of the things it needed to know about > > > > for tiering and space scavenging purposes.... > > > > > > > > > > But if this is such an ancient mystical program, why do we have to > > > keep this XFS behavior in the present? > > > BTW, is this the same HSM whose DMAPI ioctls were deprecated > > > a few years back? > > Drop the attitude, Amir. > > That "ancient mystical program" is this: > > https://buy.hpe.com/us/en/enterprise-solutions/high-performance-computing= -solutions/high-performance-computing-storage-solutions/hpc-storage-solutio= ns/hpe-data-management-framework-7/p/1010144088 > Sorry for the attitude Dave, I somehow got the impression that you were talking about a hypothetical old program that may be out of use. I believe that Jeff and Linus got the same impression... > Yup, that product is backed by a proprietary descendent of the Irix > XFS code base XFS that is DMAPI enabled and still in use today. It's > called HPE XFS these days.... > What do you mean? Do you mean that the HPE product uses patched XFS? If so, why is that an upstream concern? Upstream xfs indeed preserves di_dmstate,di_dmevmask, but it does not change those state members when file changes happen. So if mounting an HPE XFS disk on with upstream kernel is not going to record DMAPI state changes, does it matter if upstream xfs does not update di_changecount on atime change? Maybe I did not understand the situation w.r.t HPE XFS. > > > I mean, I understand that you do not want to change the behavior of > > > i_version update without an opt-in config or mount option - let the d= istro > > > make that choice. > > > But calling this an "on-disk format change" is a very long stretch. > > Telling the person who created, defined and implemented the on disk > format that they don't know what constitutes a change of that > on-disk format seems kinda Dunning-Kruger to me.... > OK. I will choose my words more carefully: I still do not understand, from everything that you have told us so far, including the mention of the specific product above, why not updating di_changecount on atime update constitutes an on-disk format change and not a runtime behavior change. You also did not address my comment that xfs_repair does not update di_changecount on any inode changes to the best of my code reading abilities. > There are *lots* of ways that di_changecount is now incompatible > with the VFS change counter. That's now defined as "i_version should > only change when [cm]time is changed". > > di_changecount is defined to be a count of the number of changes > made to the attributes of the inode. It's not just atime at issue > here - we bump di_changecount when make any inode change, including > background work that does not otherwise change timestamps. e.g. > allocation at writeback time, unwritten extent conversion, on-disk > EOF extension at IO completion, removal of speculative > pre-allocation beyond EOF, etc. > I see. Does xfs update ctime on all those inode block map changes? > IOWs, di_changecount was never defined as a linux "i_version" > counter, regardless of the fact we originally we able to implement > i_version with it - all extra bumps to di_changecount were not > important to the users of i_version for about a decade. > > Unfortunately, the new i_version definition is very much > incompatible with the existing di_changecount definition and that's > the underlying problem here. i.e. the problem is not that we bump > i_version on atime, it's that di_changecount is now completely > incompatible with the new i_version change semantics. > > To implement the new i_version semantics exactly, we need to add a > new field to the inode to hold this information. > If we change the on disk format like this, then the atime > problems go away because the new field would not get updated on > atime updates. We'd still be bumping di_changecount on atime > updates, though, because that's what is required by the on-disk > format. > I fully agree with you that we should avoid on-disk format change. This is exactly the reason that I'm insisting on the point of clarifying how exactly, this semantic change of di_changecount is going to break existing applications that run on upstream kernel. Thanks, Amir.