From: Jeff Layton <jlayton@kernel.org>
To: Thomas Gleixner <tglx@linutronix.de>,
John Stultz <jstultz@google.com>,
Stephen Boyd <sboyd@kernel.org>,
Alexander Viro <viro@zeniv.linux.org.uk>,
Christian Brauner <brauner@kernel.org>, Jan Kara <jack@suse.cz>,
Steven Rostedt <rostedt@goodmis.org>,
Masami Hiramatsu <mhiramat@kernel.org>,
Mathieu Desnoyers <mathieu.desnoyers@efficios.com>,
Jonathan Corbet <corbet@lwn.net>,
Chandan Babu R <chandan.babu@oracle.com>,
"Darrick J. Wong" <djwong@kernel.org>,
Theodore Ts'o <tytso@mit.edu>,
Andreas Dilger <adilger.kernel@dilger.ca>,
Chris Mason <clm@fb.com>, Josef Bacik <josef@toxicpanda.com>,
David Sterba <dsterba@suse.com>, Hugh Dickins <hughd@google.com>,
Andrew Morton <akpm@linux-foundation.org>,
Chuck Lever <chuck.lever@oracle.com>,
Vadim Fedorenko <vadim.fedorenko@linux.dev>
Cc: Randy Dunlap <rdunlap@infradead.org>,
linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
linux-trace-kernel@vger.kernel.org, linux-doc@vger.kernel.org,
linux-xfs@vger.kernel.org, linux-ext4@vger.kernel.org,
linux-btrfs@vger.kernel.org, linux-nfs@vger.kernel.org,
linux-mm@kvack.org
Subject: Re: [PATCH v8 01/11] timekeeping: move multigrain timestamp floor handling into timekeeper
Date: Mon, 30 Sep 2024 16:12:11 -0400 [thread overview]
Message-ID: <d3736f199973c0c4284f421850bdf8852dc193d8.camel@kernel.org> (raw)
In-Reply-To: <874j5x0vwq.ffs@tglx>
On Mon, 2024-09-30 at 21:43 +0200, Thomas Gleixner wrote:
> On Thu, Sep 19 2024 at 18:50, Jeff Layton wrote:
> > The fix for this is to establish a floor value for the coarse-grained
> > clock. When stamping a file with a fine-grained timestamp, we update
> > the floor value with the current monotonic time (using cmpxchg). Then
> > later, when a coarse-grained timestamp is requested, check whether the
> > floor is later than the current coarse-grained time. If it is, then the
> > kernel will return the floor value (converted to realtime) instead of
> > the current coarse-grained clock. That allows us to maintain the
> > ordering guarantees.
> >
> > My original implementation of this tracked the floor value in
> > fs/inode.c (also using cmpxchg), but that caused a performance
> > regression, mostly due to multiple calls into the timekeeper functions
> > with seqcount loops. By adding the floor to the timekeeper we can get
> > that back down to 1 seqcount loop.
> >
> > Let me know if you have more questions about this, or suggestions about
> > how to do this better. The timekeeping code is not my area of expertise
> > (obviously) so I'm open to doing this a better way if there is one.
>
> The comments I made about races and the clock_settime() inconsistency
> vs. the change log aside, I don't see room for improvement there.
>
> What worries me is the atomic_cmpxchg() under contention on large
> machines, but as it is not a cmpxchg() loop it might be not completely
> horrible.
>
Thanks for taking a look.
The code does try to avoid requesting a fine-grained timestamp unless
there is no other option. In practice, that means that a
write()+stat()+write() has to occur within the same coarse-grained
timer tick. That obviously does happen, but I'm hoping it's not very
frequent on real-world workloads.
The main place where we see a performance hit from this set is the
will-it-scale pipe test, which does 1-byte writes and reads on a pipe
as fast as possible.
The previous patchset tracked the floor inside fs/inode.c, and each
timestamp update on the pipe inode could make multiple trips into the
timekeeper seqcount loops. That added extra barriers and slowed things
down.
This patchset moves the code into the timekeeper, which means only a
single barrier. That gets the cost down, but it's not entirely free.
There is some small cost to dealing with the floor, regardless.
--
Jeff Layton <jlayton@kernel.org>
next prev parent reply other threads:[~2024-09-30 20:12 UTC|newest]
Thread overview: 36+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-09-14 17:07 [PATCH v8 00/11] fs: multigrain timestamp redux Jeff Layton
2024-09-14 17:07 ` [PATCH v8 01/11] timekeeping: move multigrain timestamp floor handling into timekeeper Jeff Layton
2024-09-14 20:10 ` John Stultz
2024-09-14 23:14 ` Jeff Layton
2024-09-16 10:12 ` Thomas Gleixner
2024-09-16 10:32 ` Thomas Gleixner
2024-09-16 10:57 ` Jeff Layton
2024-09-30 19:16 ` Thomas Gleixner
2024-09-30 19:37 ` Jeff Layton
2024-09-30 20:19 ` Thomas Gleixner
2024-09-30 20:53 ` Jeff Layton
2024-09-30 21:35 ` Thomas Gleixner
2024-10-01 9:45 ` Jeff Layton
2024-10-01 12:45 ` Thomas Gleixner
2024-10-02 12:41 ` Jeff Layton
2024-09-19 16:50 ` Jeff Layton
2024-09-30 19:43 ` Thomas Gleixner
2024-09-30 20:12 ` Jeff Layton [this message]
2024-09-30 19:13 ` Thomas Gleixner
2024-09-30 19:27 ` Jeff Layton
2024-09-30 20:15 ` Thomas Gleixner
2024-09-14 17:07 ` [PATCH v8 02/11] fs: add infrastructure for multigrain timestamps Jeff Layton
2024-09-14 17:07 ` [PATCH v8 03/11] fs: have setattr_copy handle multigrain timestamps appropriately Jeff Layton
2024-09-14 17:07 ` [PATCH v8 04/11] fs: handle delegated timestamps in setattr_copy_mgtime Jeff Layton
2024-09-14 17:07 ` [PATCH v8 05/11] fs: tracepoints around multigrain timestamp events Jeff Layton
2024-09-15 8:21 ` Steven Rostedt
2024-09-14 17:07 ` [PATCH v8 06/11] fs: add percpu counters for significant " Jeff Layton
2024-09-16 10:20 ` Thomas Gleixner
2024-09-14 17:07 ` [PATCH v8 07/11] Documentation: add a new file documenting multigrain timestamps Jeff Layton
2024-09-16 1:01 ` Bagas Sanjaya
2024-09-19 16:53 ` Jeff Layton
2024-09-14 17:07 ` [PATCH v8 08/11] xfs: switch to " Jeff Layton
2024-09-14 17:07 ` [PATCH v8 09/11] ext4: " Jeff Layton
2024-09-14 17:07 ` [PATCH v8 10/11] btrfs: convert " Jeff Layton
2024-09-14 17:07 ` [PATCH v8 11/11] tmpfs: add support for " Jeff Layton
2024-09-26 16:59 ` [PATCH v8 00/11] fs: multigrain timestamp redux Randy Dunlap
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=d3736f199973c0c4284f421850bdf8852dc193d8.camel@kernel.org \
--to=jlayton@kernel.org \
--cc=adilger.kernel@dilger.ca \
--cc=akpm@linux-foundation.org \
--cc=brauner@kernel.org \
--cc=chandan.babu@oracle.com \
--cc=chuck.lever@oracle.com \
--cc=clm@fb.com \
--cc=corbet@lwn.net \
--cc=djwong@kernel.org \
--cc=dsterba@suse.com \
--cc=hughd@google.com \
--cc=jack@suse.cz \
--cc=josef@toxicpanda.com \
--cc=jstultz@google.com \
--cc=linux-btrfs@vger.kernel.org \
--cc=linux-doc@vger.kernel.org \
--cc=linux-ext4@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-nfs@vger.kernel.org \
--cc=linux-trace-kernel@vger.kernel.org \
--cc=linux-xfs@vger.kernel.org \
--cc=mathieu.desnoyers@efficios.com \
--cc=mhiramat@kernel.org \
--cc=rdunlap@infradead.org \
--cc=rostedt@goodmis.org \
--cc=sboyd@kernel.org \
--cc=tglx@linutronix.de \
--cc=tytso@mit.edu \
--cc=vadim.fedorenko@linux.dev \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox