From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C9C76C4332F for ; Wed, 1 Nov 2023 23:29:12 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 5FAB68000F; Wed, 1 Nov 2023 19:29:12 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 5845A80009; Wed, 1 Nov 2023 19:29:12 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3FD588000F; Wed, 1 Nov 2023 19:29:12 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 29EB280009 for ; Wed, 1 Nov 2023 19:29:12 -0400 (EDT) Received: from smtpin11.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id ED532160D91 for ; Wed, 1 Nov 2023 23:29:11 +0000 (UTC) X-FDA: 81410978502.11.57EAD34 Received: from mail-oo1-f48.google.com (mail-oo1-f48.google.com [209.85.161.48]) by imf30.hostedemail.com (Postfix) with ESMTP id 0FB1E8000F for ; Wed, 1 Nov 2023 23:29:09 +0000 (UTC) Authentication-Results: imf30.hostedemail.com; dkim=pass header.d=fromorbit-com.20230601.gappssmtp.com header.s=20230601 header.b=mf4GSu2B; dmarc=pass (policy=quarantine) header.from=fromorbit.com; spf=pass (imf30.hostedemail.com: domain of david@fromorbit.com designates 209.85.161.48 as permitted sender) smtp.mailfrom=david@fromorbit.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1698881350; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=m9D9sfHhqQD//T+0joi06c/klbHPqB5VbxNysl+FPJQ=; b=iDZlm8GPwJsjJlcyZUeQ3s4LT2VKm+EzdGkMWnK6LlX4YOVrI1fPEaz9uaevgO3+BXW5rL SsDni8rCjwAhywbx1qAb/JAbvJkVaLcMyAKA78QKcGkRbEeCc6Ye9UiUWHhHGShxRGdo58 n/8qRbRK7XraNzg1gGOZU48m0iIjOs4= ARC-Authentication-Results: i=1; imf30.hostedemail.com; dkim=pass header.d=fromorbit-com.20230601.gappssmtp.com header.s=20230601 header.b=mf4GSu2B; dmarc=pass (policy=quarantine) header.from=fromorbit.com; spf=pass (imf30.hostedemail.com: domain of david@fromorbit.com designates 209.85.161.48 as permitted sender) smtp.mailfrom=david@fromorbit.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1698881350; a=rsa-sha256; cv=none; b=32Oisa0kRtQs5l7+bfHguMTNyups0yh6F+ZDX7uhpgehHdpDVDLUemOfhp7djRYKtUcxmJ XLpclWD5QYWAGrl7DDy940u6wxyY/LaYAcE2tz4+dluR+XCUtj+k36S+FALu68poJtCbzg pfBHfApOTVGWO2gUAgedJjIdsOCRNag= Received: by mail-oo1-f48.google.com with SMTP id 006d021491bc7-586f1db1a83so169960eaf.2 for ; Wed, 01 Nov 2023 16:29:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fromorbit-com.20230601.gappssmtp.com; s=20230601; t=1698881349; x=1699486149; darn=kvack.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=m9D9sfHhqQD//T+0joi06c/klbHPqB5VbxNysl+FPJQ=; b=mf4GSu2BgvqitLhPx/MVzPUHAbxZOB8e18rUIo35SX463CzCq+HR5osNGzm8glhX9/ rUu0IemCGDZ9tTHPbTtRZzSYXjrGaK0/bNZrPgCpheBju1b2mx/953mXRmI1+m491+/x Pn6rxF+vLNNSnef5veRPu5fQgcvCqYqSjR5rCEowgEzb6rrjmpWUz2VM8eo1QGMpwK9W 1oL/fGChKacHj2ccej3FkfV5aqCQDPzfMUEoU8XzItcfOcDET2fu71BxOAz4kbPyZ6eE szefBege1Mv/c6bsJ+eqp6v4WulsLFjJOiR4ndW9ZhzDeggIMCkFIrLyZ3ZIfSO97sL2 fB+Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1698881349; x=1699486149; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=m9D9sfHhqQD//T+0joi06c/klbHPqB5VbxNysl+FPJQ=; b=N6Pzdr3b0OiYwx7NYhzkkh207Q3AAjsTi5l4feHMeN20Ltf59uznfnsKwMdPRvmczf Kk/LWU8bl8T61VKUMxpPt3JpZZ7XPtsVnJyYIlCUpssfrgGr+L7hxLEtbQmobXW2Okeq /oleb9aYbpbDBuIe9Dn2Wqw8KXdYyV1PwcFjNPWQvMOgM2ZQBnQqLWYpli5aVbJIc4rs bANvB5ibRRsduQYdPo2UiTPbxNyAN1AnlYsFpj0T5ZXimm2neyBSNm/gCCEq9PFUJOiX WQIZ33OBvTfeWJxJcbi1c5KPrj7NmKYP1NstSp9nnNJ3KgqxRlVlfbPMLloQDTFGJaee nc6w== X-Gm-Message-State: AOJu0YygHujA/YDachfER+mU7ZM9KOgGDQXexAJXx8lU9I6nPIKWi2BT mcCBFsm75g9pkOg9vkJov6x8sQ== X-Google-Smtp-Source: AGHT+IHBbuuxoCvj122hy4w+Q5lkHRxzDKw2Hawu2bpMvdgl9pcfnhSlloOBc2uNwlUNFzkm4O5+rA== X-Received: by 2002:a05:6358:724d:b0:169:845b:3417 with SMTP id i13-20020a056358724d00b00169845b3417mr10842702rwa.25.1698881349037; Wed, 01 Nov 2023 16:29:09 -0700 (PDT) Received: from dread.disaster.area (pa49-180-20-59.pa.nsw.optusnet.com.au. [49.180.20.59]) by smtp.gmail.com with ESMTPSA id e22-20020a637456000000b0058a9621f583sm354653pgn.44.2023.11.01.16.29.08 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 01 Nov 2023 16:29:08 -0700 (PDT) Received: from dave by dread.disaster.area with local (Exim 4.96) (envelope-from ) id 1qyKe0-006zm8-39; Thu, 02 Nov 2023 10:29:05 +1100 Date: Thu, 2 Nov 2023 10:29:04 +1100 From: Dave Chinner To: Trond Myklebust Cc: "torvalds@linux-foundation.org" , "jack@suse.cz" , "clm@fb.com" , "josef@toxicpanda.com" , "jstultz@google.com" , "djwong@kernel.org" , "brauner@kernel.org" , "chandan.babu@oracle.com" , "hughd@google.com" , "linux-xfs@vger.kernel.org" , "akpm@linux-foundation.org" , "dsterba@suse.com" , "linux-kernel@vger.kernel.org" , "jlayton@kernel.org" , "tglx@linutronix.de" , "linux-mm@kvack.org" , "linux-nfs@vger.kernel.org" , "tytso@mit.edu" , "viro@zeniv.linux.org.uk" , "linux-ext4@vger.kernel.org" , "amir73il@gmail.com" , "linux-btrfs@vger.kernel.org" , "linux-fsdevel@vger.kernel.org" , "adilger.kernel@dilger.ca" , "kent.overstreet@linux.dev" , "sboyd@kernel.org" , "dhowells@redhat.com" , "jack@suse.de" Subject: Re: [PATCH RFC 2/9] timekeeping: new interfaces for multigrain timestamp handing Message-ID: References: <6df5ea54463526a3d898ed2bd8a005166caa9381.camel@kernel.org> <3d6a4c21626e6bbb86761a6d39e0fafaf30a4a4d.camel@kernel.org> <20231101101648.zjloqo5su6bbxzff@quack3> <3ae88800184f03b152aba6e4a95ebf26e854dd63.camel@hammerspace.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <3ae88800184f03b152aba6e4a95ebf26e854dd63.camel@hammerspace.com> X-Rspamd-Queue-Id: 0FB1E8000F X-Rspam-User: X-Rspamd-Server: rspam05 X-Stat-Signature: hz1mh18yx57dodj1uts9817e6ij6y6zr X-HE-Tag: 1698881349-828464 X-HE-Meta: U2FsdGVkX1/BbuQTD+FkqhSd98mD+kbDEUbv822tb6Tf95rv/vlDVkJ9NH/0e52FufUtbaSNmqfqk9sZZw2PP02TDiUWRB/lDphPPuV9gpgllkqBT9/a7Zjclk5IW//ksszsn35PjRLybD23i/35rMTVr/EmOxqav8/2AoMEXqEkDJWBwKhMfrNg1ifXWxJ/kUPafB2aps5iQ6LSu23tqVB8kbFqouxLTu4zNdDa2JyKkmr8bEMvkgV3vuyNGnE80FZ9U7h/Ys+iSEBawgp566nw2fT08Prwf5DrO5ut5mNfaw3y0EQRv8f2gJmrD8EbO/Jjdj+OgVCNrtbqTUCLlJi+bxVQUgPHa0X+vOvv3tTa+T8GGFbvx4M4YEcnsyzEbVYlAyBiULg08UyDYINqQrc296UXCSDA12z2BueV2dPWTgBEoprHhBVZct9kL65pqbowKetmv+U0wsdApAh68FN7fnFTrtX+LWL2BbufiQW0+EbJagHKL5T3TeEguP+nJK7NMioVjwX2rncO2ThBeUWjnmvT4mBCnx3anOHj3ha+Z14ajOOpnXh36RvVaO0OXv8OoNXIKu2Vs/sGVS0FwbyP1v5PbQ/0gbrPkDlDCMhHkFjBOti8OU6c6UksrZ+ZyLiDR2/fI8QlFN3cfTs1BofSdHb70jYi7F5qU1NyO2q931y1K8FqMQXQJERquTHxxD3T1uU0fA2lcLjHc9cOhjC8YHFTauVtmoQ/tF6R3jqaGiPU04+58f2piIvuNm20sqG28JFnMz8zBKZX6m9niQX56pXiwgUftLITfQWHqHADc7/JoL6bqYH+Zwpck7F4+atTbpS/rlJelhTm0xUwn56PMGTvfSUF3BVQ32NU3reO6eyW8jsMnha74du8PrUy21oF8RLYz5Bzz8YLim8Vee6ArctVI5zwYemR77SbiLriDkTmr4ZINvE+nNO1jUZOXriGXMJlJbBzMbyzN/d 97rIdA/u EsChdczZaOWmkZw4+K8WYK5Ruq3++Fvu5XFWlYHX52KA1RmopBplGbiVsJVhEXox1ZZVoO95Tzg/TnRc9lPeIjXyfgrYN9i7+MP54l+9hdsTAXh9beLAFTx3MiqPWKPnEDUju+7OmZLRIUw1GdiMg+A01bRADUTEeOSZft7ZAK2pVEjwC4AD+7kejqwwi6F/wlwYOHBTD/QC1obhezSK3Wr369uPZywA2pOLCTLKG1M5y/T7HkHEn7liiy/9l7Ns0UqEs7pANBVH6EOkzbrP3F8uNZ06DezpJs0knLzH1NLpw84iUilI30LgU8DDVFTOapGoOpv/6gBWbcM2jzOfcvPPFdl7KkSFHvyy2SoTJXRS32Shhi21DQ6Uc79894VHthQeQj/0rzams0qE/nyJxFe2VI+lahYiINcV5iKTpWbUQJVwpQ6JaSpYugG457C2cr7/TMI5QoVizsZK7k4cOcSPOAWm1c4xmeppeewSNniE71pg= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, Nov 01, 2023 at 09:34:57PM +0000, Trond Myklebust wrote: > On Wed, 2023-11-01 at 10:10 -1000, Linus Torvalds wrote: > > The above does not expose *any* changes to timestamps to users, and > > should work across a wide variety of filesystems, without requiring > > any special code from the filesystem itself. > > > > And now please all jump on me and say "No, Linus, that won't work, > > because XYZ". > > > > Because it is *entirely* possible that I missed something truly > > fundamental, and the above is completely broken for some obvious > > reason that I just didn't think of. > > > > My client writes to the file and immediately reads the ctime. A 3rd > party client then writes immediately after my ctime read. > A reboot occurs (maybe minutes later), then I re-read the ctime, and > get the same value as before the 3rd party write. > > Yes, most of the time that is better than the naked ctime, but not > across a reboot. This sort of "crash immediately after 3rd party data write" scenario has never worked properly, even with i_version. The issue is that 3rd party (local) buffered writes or metadata changes do not require any integrity or metadata stability operations to be performed by the filesystem unless O_[D]SYNC is set on the fd, RWF_[D]SYNC is set on the IO, or f{data}sync() is performed on the file. Hence no local filesystem currently persists i_version or ctime outside of operations with specific data integrity semantics. nfsd based modifications have application specific persistence requirements and that is triggered by the nfsd calling ->commit_metadata prior to returning the operation result to the client. This is what persists i_version/timestamp changes that were made during the nfsd operation - this persistence behaviour is not driven by the local filesystem. IOWs, this "change attribute failure" scenario is an existing problem with the current i_version implementation. It has always been flawed in this way but this didn't matter a decade ago because it's only purpose (and user) was nfsd and that had the required persistence semantics to hide these flaws within the application's context. Now that we are trying to expose i_version as a "generic change attribute", these persistence flaws get exposed because local filesystem operations do not have the same enforced persistence semantics as the NFS server. This is another reason I want i_version to die. What we need is a clear set of well defined semantics around statx change attribute sampling. Correct crash-recovery/integrity behaviour requires this rule: If the change attribute has been sampled, then the next modification to the filesystem that bumps change attribute *must* persist the change attribute modification atomically with the modification that requires it to change, or submit and complete persistence of the change attribute modification before the modification that requires it starts. e.g. a truncate can bump the change attribute atomically with the metadata changes in a transaction-based filesystem (ext4, XFS, btrfs, bcachefs, etc). Data writes are much harder, though. Some filesysetm structures can write data and metadata in a single update e.g. log structured or COW filesystems that can mix data and metadata like btrfs. Journalling filesystems require ordering between journal writes and the data writes to guarantee the change attribute is persistent before we write the data. Non-journalling filesystems require inode vs data write ordering. Hence I strongly doubt that a persistent change attribute is best implemented at the VFS - optimal, efficient implementations are highly filesystem specific regardless of how the change attribute is encoded in filesysetm metadata. This is another reason I want to change how the inode timestamp code is structured to call into the filesystem first rather than last. Different filesystems will need to do different things to persist a "ctime change counter" attribute correctly and efficiently - it's not a one-size fits all situation.... -Dave. -- Dave Chinner david@fromorbit.com