From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 23263C00A8F for ; Mon, 23 Oct 2023 23:26:34 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 6766B6B0161; Mon, 23 Oct 2023 19:26:34 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 626A86B0162; Mon, 23 Oct 2023 19:26:34 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 51D956B0163; Mon, 23 Oct 2023 19:26:34 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 3FC436B0161 for ; Mon, 23 Oct 2023 19:26:34 -0400 (EDT) Received: from smtpin24.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 0CFAFC0173 for ; Mon, 23 Oct 2023 23:26:34 +0000 (UTC) X-FDA: 81378312708.24.3CBAA41 Received: from mail-pf1-f180.google.com (mail-pf1-f180.google.com [209.85.210.180]) by imf15.hostedemail.com (Postfix) with ESMTP id 10985A0006 for ; Mon, 23 Oct 2023 23:26:31 +0000 (UTC) Authentication-Results: imf15.hostedemail.com; dkim=pass header.d=fromorbit-com.20230601.gappssmtp.com header.s=20230601 header.b=XL1wAfNi; dmarc=pass (policy=quarantine) header.from=fromorbit.com; spf=pass (imf15.hostedemail.com: domain of david@fromorbit.com designates 209.85.210.180 as permitted sender) smtp.mailfrom=david@fromorbit.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1698103592; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=G+Gux9V9VrIJG9Key4Kir8qGYfaDp/xU9ex2KP3eXys=; b=SQoXQYCPp/SJbXurP0oPUbP8hsHKhhTrRklFeCpe0WavoHomMfSei7FZy+/gVuu6Bo0vhX v4wN84/npCpiGWGypGzONq3FEvUjgrS4UrqhaENfm4uXln7mnBfa18+NRF3zbs5iubzvbO 0idBoPyfUKh+FsWe8ElF9btWtmRc+8k= ARC-Authentication-Results: i=1; imf15.hostedemail.com; dkim=pass header.d=fromorbit-com.20230601.gappssmtp.com header.s=20230601 header.b=XL1wAfNi; dmarc=pass (policy=quarantine) header.from=fromorbit.com; spf=pass (imf15.hostedemail.com: domain of david@fromorbit.com designates 209.85.210.180 as permitted sender) smtp.mailfrom=david@fromorbit.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1698103592; a=rsa-sha256; cv=none; b=cL5reImHUyltKheVw8LQLFqkMj36iqf+oaNo2mvZGhqVJmHZZ8Otb0eJKMAovVMVNzoQp+ GoRYIv5s6gh3RmDxr9MNr+rb9GSvWijlD0b5sfaAet5lZQm3WKfaFjysl5j3BSY7Lm4Piz 0W5GNqqEqb6+koN/jOaOME5R0N0sUuY= Received: by mail-pf1-f180.google.com with SMTP id d2e1a72fcca58-6b36e1fcee9so3192193b3a.3 for ; Mon, 23 Oct 2023 16:26:31 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fromorbit-com.20230601.gappssmtp.com; s=20230601; t=1698103591; x=1698708391; darn=kvack.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=G+Gux9V9VrIJG9Key4Kir8qGYfaDp/xU9ex2KP3eXys=; b=XL1wAfNijNKEDMSilluYua9LqkGCk0r4lbm9ChbQlKjSvEL1rB5Ifviofo5KB9l207 5/qdeoQjDf1g2Cd6+9wG9b55uKdxEkQEuF29hc5aEFkVgP/qwmZ8j4mtM9YpSv3pd3e0 irHnR6Z869YTUVXLM49wqwJXm4PCUgTVw4vK0j+4JKS9zyE68LZMBGEbogaFDTq1jO0R TIvr1XsTweNIJaIx7kbs8WRVlMfOB3c+5zKKD2Zpj+aAUzoRcGi0oCIFOcJ00u7PKvFc GJrji7im7GMY6lDbi7b39zB2QXpYlXrou8mSqF0ccYVD3aYlrda7YkZQpA4DqIWGNUKx vIAA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1698103591; x=1698708391; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=G+Gux9V9VrIJG9Key4Kir8qGYfaDp/xU9ex2KP3eXys=; b=n3snc/FqJtH739wCOWPtQQ5KNhIu+O6CNEwr3d0Y1PrJAKkFzFKUSIEQzztNuAPcFJ J7CoSLKd1y2az2Uij3iWDDSCJVIr9CafKr4noBgAy8tRt7nznya5kzISpuZY48DEmLqU NJ8FrjEoGIy2R49ojP8lqTqTG0pQsvp775w9KHqQ4wUzz7l5tOQcDzQGoK2ca9npQuBD gPB7GS/i8t2V22cqsCmV12aVHlaCiOfpwjNikzqmb2Bn/Z62ojMA6wFh1LJnerSnifp6 4qdMau99dZqCg4WCSfMwDtKj5kF5t9IjkjvZ7tICUpqZ1IQwYokDaSTBBfN1uCreEQ1H Orsg== X-Gm-Message-State: AOJu0YzlQXfRgFr7ojRVU01hN3wM09CZf1jCYhAr3HsXdKPK3efHVy2g sPWh//Qt2ktm/u8zTpydZnOcvw== X-Google-Smtp-Source: AGHT+IFbvK8ZgyXM/nA60wNancq6H/LuRCiF/m3G0/gTTq4ADSx4A3l6pnfh9u5eBcKsHipLBdZx2g== X-Received: by 2002:a05:6a00:18a9:b0:692:780a:de90 with SMTP id x41-20020a056a0018a900b00692780ade90mr9935913pfh.30.1698103590604; Mon, 23 Oct 2023 16:26:30 -0700 (PDT) Received: from dread.disaster.area (pa49-180-20-59.pa.nsw.optusnet.com.au. [49.180.20.59]) by smtp.gmail.com with ESMTPSA id w18-20020aa78592000000b006bee5ad4efasm6715066pfn.67.2023.10.23.16.26.29 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 23 Oct 2023 16:26:29 -0700 (PDT) Received: from dave by dread.disaster.area with local (Exim 4.96) (envelope-from ) id 1qv4JX-0034Zl-06; Tue, 24 Oct 2023 10:26:27 +1100 Date: Tue, 24 Oct 2023 10:26:27 +1100 From: Dave Chinner To: Jeff Layton Cc: Kent Overstreet , Christian Brauner , Linus Torvalds , Alexander Viro , John Stultz , Thomas Gleixner , Stephen Boyd , Chandan Babu R , "Darrick J. Wong" , Theodore Ts'o , Andreas Dilger , Chris Mason , Josef Bacik , David Sterba , Hugh Dickins , Andrew Morton , Amir Goldstein , Jan Kara , David Howells , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-xfs@vger.kernel.org, linux-ext4@vger.kernel.org, linux-btrfs@vger.kernel.org, linux-mm@kvack.org, linux-nfs@vger.kernel.org Subject: Re: [PATCH RFC 2/9] timekeeping: new interfaces for multigrain timestamp handing Message-ID: References: <5f96e69d438ab96099bb67d16b77583c99911caa.camel@kernel.org> <20231019-fluor-skifahren-ec74ceb6c63e@brauner> <0a1a847af4372e62000b259e992850527f587205.camel@kernel.org> <61b32a4093948ae1ae8603688793f07de764430f.camel@kernel.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <61b32a4093948ae1ae8603688793f07de764430f.camel@kernel.org> X-Rspamd-Queue-Id: 10985A0006 X-Rspam-User: X-Rspamd-Server: rspam02 X-Stat-Signature: 4mz9ichzwegr51bydg5agaxxpzym8exf X-HE-Tag: 1698103591-241175 X-HE-Meta: U2FsdGVkX1/S+wOMWAyLTKUVR1ITLybXVQ3IrkPz/hZM6ZSYQee8i1bEdzDsndyd5BZN7mAjnVy/i39AtsvubYTGfZ/bv2Gme9QiOZC+eqUlXUQ7Dw0lQEPmjda+lgyivNcDo/Euv9Nl+3SX1AuW3+VPzIZx/Nou1j57FgBX6Cf1AH+ghIR0W/J1Bnj9qWuE95czgGqHhR+ZIwWB8Y6Y2JPjV10p6xGVcAjOhDXx+5tH/zG6E5A0ywIEVyiq/lFZZSZhjp/tYvIAZby77DWJ1/XotuOaS6f4i22ijMJDkOkuXMAa5GSP5QsTc6r3FQZmNfVYfv961Bbb5gGNKwCbdL2hKg2Li5cTBf3+itrvGn+L2rxw4+xRlLyiCZc4XGb0adwBk/AhPoSjegAC07Y2dOcS011gdYwZm0HFaGHf3dU4ha3CuQl7hX1Exr8wdYB0f28D8vjQmmXBOWrFgs441GiIk/JE81fRIAWhG5OP6oql9CtVTUJ/IBISyl/ilZDs3PRpHmhSXPaaj2bTxOtwAAQOtgC4cBcJuXnhlxgBSDQOXv31XJYFM8kAI3UaUQuwLLhKBnYK41iruzRqXvjzu/KwrmS42ml/3ALPyZOdhcwskdQO8LyBnHfG5sERI3Ksuhda8ZVGRe5yiuqQoaRl+aJFGd/yvolToVKHQdCx9R+3N+0maRbKvYymhuYOPhNC1lFH3+wGq+i0oH6Zvdgxyus/wmaCGNLYYIOZKAWnQh8rvZJcm8nglLviWvOQ9qi0fSYTkkyvaUCeIJmY1pWvs59+1lx/wRrWh1H6nfv8yqcueblUuVVCsr9Pdv4e7PIo1EzaVqVzVMjCLw2x3KFWQbfLPgrPAkah8aUFpTB/+GDqtJLRicvWJkP7pvIXUDQBAFyeyBKVXuFWbwwCG5PXLlWv2jIUYejI78CwfHNczm76WYsH+3fy8FZ/FHnjttnZ4cOAWR/fuBFyr78htbj iLDcqhzm IO01fteaYrPGhfJ6crgwCo2c25mDe3fZE9D95PsOcny+0tJT4FOXOJg3THA/SAtwqnZqH5UZpp2OwhwobsMc0RjP004sc54BtjAf/mSY2vWk6SwHQV5wt1Fr6A9w9TafMj4EG7p7gKzODGpDrkkfhlcK7emTHt8xzWLRDpF75HNayarh++LmPir3iVI8zZxbtfITtAkRZTA9VI0eFIcBYb9KjB53ecMoNlGA/3zBEKPs066a7CCJzBlcHcjl3xttPT2CQAP2KSjvhnyKR2WRlKutM4EgWYPSY3WqSjhdXHALXLmzXHEf05CDCmIgU45R7VudLEM6ujPSkC2RCnP8XFqXojF3UD+M6+BB2ZuPEcha/2uBWRHHpEK6ttLsh1esW9mf0r04Zwk+2nJme36nTob5FNJSnedHcDTvrHoa6AGJfK7y+goqcjQ2gvMJUqUAhZUz/ X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Mon, Oct 23, 2023 at 10:45:21AM -0400, Jeff Layton wrote: > On Mon, 2023-10-23 at 09:17 +1100, Dave Chinner wrote: > > All I'm suggesting is that rather than using mount options for > > noatime-like behaviour for NFSD accesses, we actually have the nfsd > > accesses say "we'd like pure atime updates without iversion, please". > > > > Keep in mind that XFS does actually try to avoid bumping i_version > > on pure timestamp updates - we carved that out a long time ago (see > > the difference in XFS_ILOG_CORE vs XFS_ILOG_TIMESTAMP in > > xfs_vn_update_time() and xfs_trans_log_inode()) so that we could > > optimise fdatasync() to ignore timestamp updates that occur as a > > result of pure data overwrites. > > > > Hence XFS only bumps i_version for pure timestamp updates if the > > iversion queried flag is set. IOWs, XFS it is actually doing exactly > > what the VFS iversion implementation is telling it to do with > > timestamp updates for non-core inode metadata updates. > > > > That's the fundamental issue here: nfsd has set VFS state that tells > > the filesystem to "bump iversion on next persistent inode change", > > but the nfsd then runs operations that can change non-critical > > persistent inode state in "query-only" operations. It then expects > > filesystems to know that it should ignore the iversion queried state > > within this context. However, without external behavioural control > > flags, filesystems cannot know that an isolated metadata update has > > context specific iversion behavioural constraints. > > > Hence fixing this is purely a VFS/nfsd i_version implementation > > problem - if the nfsd is running a querying operation, it should > > tell the filesystem that it should ignore iversion query state. If > > nothing the application level cache cares about is being changed > > during the query operation, it should tell the filesystem to ignore > > iversion query state because it is likely the nfsd query itself will > > set it (or have already set it itself in the case of compound > > operations). > > > > This does not need XFS on-disk format changes to fix. This does not > > need changes to timestamp infrastructure to fix. We just need the > > nfsd application to tell us that we should ignore the vfs i_version > > query state when we update non-core inode metadata within query > > operation contexts. > > > > I think you're missing the point of the problem I'm trying to solve. > I'm not necessarily trying to guard nfsd against its own accesses. The > reads that trigger an eventual atime update could come from anywhere -- > nfsd, userland accesses, etc. > > If you are serving an XFS filesystem, with the (default) relatime mount > option, then you are guaranteed that the clients will invalidate their > cache of a file once per day, assuming that at least one read was issued > against the file during that day. > > That read will cause an eventual atime bump to be logged, at which point > the change attribute will change. The client will then assume that it > needs to invalidate its cache when it sees that change. > > Changing how nfsd does its own accesses won't fix anything, because the > problematic atime bump can come from any sort of read access. I'm not missing the point at all - as I've said in the past I don't think local vs remote access is in any way relevant to the original problem that needs to be solved. If the local access is within the relatime window, it won't cause any persistent metadata change at all. If it's outside the window, then it's no different to the NFS client reading data from the server outside the window. If it's the first access after a NFS client side modification, then it's just really bad timing but it isn't likely to be a common issue. Hence I just don't think it matters on bit, and we can address the 24 hour problem separately to the original problem that still needs to be fixed. The problem is the first read request after a modification has been made. That is causing relatime to see mtime > atime and triggering an atime update. XFS sees this, does an atime update, and in committing that persistent inode metadata update, it calls inode_maybe_inc_iversion(force = false) to check if an iversion update is necessary. The VFS sees I_VERSION_QUERIED, and so it bumps i_version and tells XFS to persist it. IOWs, XFS is doing exactly what the VFS is telling it to do with i_version during the persistent inode metadata update that the VFS told it to make. This, however, is not the semantics that the *nfsd application* wants. It does not want i_version to be updated when it is running a data read operation despite the fact the VFS is telling the filesystem it needs to be updated. What we need to know is when the inode is being accessed by the nfsd so we can change the in-memory timestamp update behaviour appropriately. We really don't need on-disk format changes - we just need to know that we're supposed to do something special with pure timestamp updates because i_version needs to be behave in a manner compatible with the new NFS requirements.... We also don't need generic timestamp infrastructure changes to do this - the multi-grained timestamp was a neat idea for generic filesystem support of the nfsd i_version requirements, but it's collapsed under the weight of complexity. There are simpler ways individual filesystems can do the right thing, but to do that we need to know that nfsd has actively referenced the inode. How we get that information is what I want to resolve, the filesystem should be able to handle everything else in memory.... Perhaps we can extract I_VERSION_QUERIED as a proxy for nfsd activity on the inode rather than need a per-operation context? Is that going to be reliable enough? Will that cause problems for other applications that want to use i_version for their own purposes? Cheers, Dave. -- Dave Chinner david@fromorbit.com