From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id DCEC2FC6172 for ; Fri, 13 Sep 2024 19:05:17 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 6F6576B0085; Fri, 13 Sep 2024 15:05:17 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 67EBD6B0088; Fri, 13 Sep 2024 15:05:17 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4D1606B008A; Fri, 13 Sep 2024 15:05:17 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 25BC86B0085 for ; Fri, 13 Sep 2024 15:05:17 -0400 (EDT) Received: from smtpin01.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id CAE1140596 for ; Fri, 13 Sep 2024 19:05:16 +0000 (UTC) X-FDA: 82560643032.01.CC2FA66 Received: from mail-ej1-f47.google.com (mail-ej1-f47.google.com [209.85.218.47]) by imf09.hostedemail.com (Postfix) with ESMTP id EADEC140007 for ; Fri, 13 Sep 2024 19:05:14 +0000 (UTC) Authentication-Results: imf09.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=DGSV4cE9; spf=pass (imf09.hostedemail.com: domain of jstultz@google.com designates 209.85.218.47 as permitted sender) smtp.mailfrom=jstultz@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1726254260; a=rsa-sha256; cv=none; b=OIbn8zBXGtcA6qp8AY6d5MX6tm+Vq0/hluQEcktqd73JSd65p1FzFVM9AyeMoozjHw8Dqw MYG/BwWl1/edE74AVlmppFPOVNhx0oIErvcxn0qEPHD7SMy5/g3AseDlYKdxukC6fwj0la vLMitvXVwKRoZOhPVPFe6i6KKumZYtI= ARC-Authentication-Results: i=1; imf09.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=DGSV4cE9; spf=pass (imf09.hostedemail.com: domain of jstultz@google.com designates 209.85.218.47 as permitted sender) smtp.mailfrom=jstultz@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1726254260; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=4WoHxTe60RAKB/6oAK3V+qyyYMmqcduoEZhn+8jgBzQ=; b=z3uBHy9yITqn3ExEVEoy4vDDOA2a4TrykiuBBw0hZhdUnbF2cGDh3NmSahIWX5TA+Q/T/0 +8mi9lF2Mb/rqcJUGfuL557JcwD/NgzhxVG1GA3qQq3JnUYeD324Jz4LQhuwXD4a5ZQRaP gsNd21Us7RGDjRJMLQgjBMcuI+E7UR8= Received: by mail-ej1-f47.google.com with SMTP id a640c23a62f3a-a8d4979b843so336178866b.3 for ; Fri, 13 Sep 2024 12:05:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1726254313; x=1726859113; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=4WoHxTe60RAKB/6oAK3V+qyyYMmqcduoEZhn+8jgBzQ=; b=DGSV4cE9IR0gQZQDsOnCA1mOPrHOtJmiaRNwCXA5D4SuCUZ6KQJy6fao0lZ/HqcNdk pAWt2RzVHosL5mx06beN2k60wZkejmdXONVPBx5JzzcE5C/pvxQsooHEqPtRmkrNW5sE R6oeL2P3oDAuMNc/2N7uN24H9kIZTjJqi0g2h4f0qRU+x4XNOz1QqEx4aKKypcigaiUF AkGIAAS9rj4oHzngVy/HEtk0e5rw2tzQq+AzJjqextov3sxsBKM4l4C8PUTV/EPe+0Ea P63Rjfs1mbd3jj0gIFGYBc3fiFYtVNHyt3GsmpJ4NobToMQwvDcZ5FPschGLLTBIru2G hniA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1726254313; x=1726859113; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=4WoHxTe60RAKB/6oAK3V+qyyYMmqcduoEZhn+8jgBzQ=; b=YfDtNnQ18HzefT/ZkXz4tR7cEXnCkYrBtIv8W3GFpu7hB0fTT7Ra3TAFeyRdJ56UeH tA03LRgJPgHVfNaLubrZpMMAqQYd3foJ8nPHwpiVg1FSwtX+TyvvVl9d6zlsCybHwpt/ oRMRAJGSpdQefOsSpOba4KZF123SddPEYFZTfObU86/0uJHzcSXrd6ep/dnkWArTXO9A w701LnsH7n3nsIVL8sJeSZa9JZs87sha5o5xauz7GU9yIzgoG7QBq5CUgw+pXwfg8H9O uI0UnIPorw65GfzL/2WmISO0iCb4YJxJAHjbkYK6waBXQyJv2hMRz8ZXNID0It3nDEG9 gI+A== X-Forwarded-Encrypted: i=1; AJvYcCWgAP1bhIMsJC1FiPXC36fjw2TO2mWxCpx5gQ2HVRIN0HUziCYZ1vcwn7yFiU//e1qxoF8YzijMxQ==@kvack.org X-Gm-Message-State: AOJu0YwURdgyGFt3e6Umtg6BRKiSQnqa2sa4Sryzgh7Vtes1Ftk/GEln fDBDAOwv97t08HWumuZMxVDvm+TaSkh+fVZ7sj5Aur3egt7m7yaEhqTVGHWgrftgrUAj3C4RBEm C9f1l6qwAGw6v4qlSX/itGGl99Jc3hPUHjFU= X-Google-Smtp-Source: AGHT+IG59IuCCkmHhT+Rulyrhsmm8XAw4Ppz/CVDh2Em59N1I4OU70I4N7kTlFoJUR58V4SEALGhnWnRyRvV9d3fR0Y= X-Received: by 2002:a17:906:c14c:b0:a8a:9054:83b8 with SMTP id a640c23a62f3a-a9029619a16mr727817566b.46.1726254312497; Fri, 13 Sep 2024 12:05:12 -0700 (PDT) MIME-Version: 1.0 References: <20240913-mgtime-v7-0-92d4020e3b00@kernel.org> <20240913-mgtime-v7-1-92d4020e3b00@kernel.org> In-Reply-To: From: John Stultz Date: Fri, 13 Sep 2024 12:05:01 -0700 Message-ID: Subject: Re: [PATCH v7 01/11] timekeeping: move multigrain timestamp floor handling into timekeeper To: Jeff Layton Cc: Thomas Gleixner , Stephen Boyd , Alexander Viro , Christian Brauner , Jan Kara , Steven Rostedt , Masami Hiramatsu , Mathieu Desnoyers , Jonathan Corbet , Chandan Babu R , "Darrick J. Wong" , "Theodore Ts'o" , Andreas Dilger , Chris Mason , Josef Bacik , David Sterba , Hugh Dickins , Andrew Morton , Chuck Lever , Vadim Fedorenko , linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-trace-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-xfs@vger.kernel.org, linux-ext4@vger.kernel.org, linux-btrfs@vger.kernel.org, linux-nfs@vger.kernel.org, linux-mm@kvack.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Stat-Signature: 16b3eh1s1pibniyw7i9nkoxskfpo1zcq X-Rspamd-Queue-Id: EADEC140007 X-Rspam-User: X-Rspamd-Server: rspam10 X-HE-Tag: 1726254314-555134 X-HE-Meta: U2FsdGVkX19QLPCWyGeSVyi2Y8uLyKdTU2eyUYyaifWQnM09Fotjd/eeF18xlWXHpKtfW7ouYjxnuiad397URh0yDB2Mpc9G4tpICPzFxJE8QxX8B6L2ZoL+D2LKQRD7Hy5qyQzhm7gPMmnB7090HCIJGC1Q6IdV694HDSlUClFhBziXRKtRmJYhg0+s1UZSK8mkHu8SyRsqnj3sn5VoNEj19dcP2HlxrFc9edRBssuqSLmyBuizzln9HFq55l8D8ya705OrGeQ2ue6U2KkVEOjF8cbpPeWci89daW/1WuGbvHG4C77rmTPBA17N8wDv1cq17Jk9+BH8k6uGuT0CP2aECDOoTWzT2Vih2Zo9UAGSAVjktLMyqMaH81FjWLLiPGOhYm7EYbw1a52Y9zQm+IijAH7fePcpxqjwvQSM+CEz+kPpRZrGkwzjpMe0804+1b9+u3wOgEN7VG8jQ5YKnfdddvbxgOb1S7C3CkxOh74eFZi/wyTpeWdzx4/ALxCFLiH31Zx3Zj7Na/sPzGNNg+VbsE8t7uA36aDz71RSpa8Uag11uwRTyek6WsjdeocFFX23M/wYwpp9Av/JbSfxiNR80nYHYbReoh1fhbSErTMkTqQn6o7nR1j3ojGMVGS/BCvUo/J8ocnHIS0I31DnqpVazT++AXg0QW+jCq6Z7XZ6OYoTp8L4/tQjpaByZHHnXYH4xC86D/XUNtI9vtEn0MERoSRM1fASDmZk61YC8U64A7MeuffTMzfz7l2jLY2RAdNAouuI97StybA30IxCJF7wc9TMQZ2fit2Al917KXZelP0Oop4+8MkYDyWSX3045K5nGtgbKK9CJFriOxIg9uOjEl9p3oXbOjrY6U+XvzH/YWdUtKdBUbSklYN4c14WT4agnM0EVu0AK/NtXACaJSwqu7YDJ4X9cZCTZ7pbhZo3wPqvHgtdjZpIVlyEJkm4i6MzbdXzauebo+OK18S Q4S/x+X2 b11Ehr1LPGV2rE4jLbS7+cZNhWhZrd43KugjchVMEFHV5haIkAlHC/KlNU3lPQFhLfNt4dr7UfPr4cn1Y97KV8IGXKr5DLxVrW/spdM85WbvHNFf2PK6uo0oFUju3B6OU51dy+wuuFC0Y0MNXEiHkDq7LEsNZGQ7Apmy3nrks644rNGrkVVwZ0Sxs9gb3AjysPgSQpoHygIEzHXAG37FNsI/LCp5i4dO1uNKlhnCtZegCttkGQIME0B53g3aFWbC0hkxE X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Fri, Sep 13, 2024 at 11:59=E2=80=AFAM John Stultz w= rote: > > On Fri, Sep 13, 2024 at 6:54=E2=80=AFAM Jeff Layton = wrote: > > > > For multigrain timestamps, we must keep track of the latest timestamp > > that has ever been handed out, and never hand out a coarse time below > > that value. > > > > Add a static singleton atomic64_t into timekeeper.c that we can use to > > keep track of the latest fine-grained time ever handed out. This is > > Maybe drop "ever" and add "handed out through a specific interface", > as timestamps can be accessed in a lot of ways that don't keep track > of what was returned. > > > > tracked as a monotonic ktime_t value to ensure that it isn't affected b= y > > clock jumps. > > > > Add two new public interfaces: > > > > - ktime_get_coarse_real_ts64_mg() fills a timespec64 with the later of = the > > coarse-grained clock and the floor time > > > > - ktime_get_real_ts64_mg() gets the fine-grained clock value, and tries > > to swap it into the floor. A timespec64 is filled with the result. > > > > Since the floor is global, we take great pains to avoid updating it > > unless it's absolutely necessary. If we do the cmpxchg and find that th= e > > value has been updated since we fetched it, then we discard the > > fine-grained time that was fetched in favor of the recent update. > > > > To maximize the window of this occurring when multiple tasks are racing > > to update the floor, ktime_get_coarse_real_ts64_mg returns a cookie > > value that represents the state of the floor tracking word, and > > ktime_get_real_ts64_mg accepts a cookie value that it uses as the "old" > > value when calling cmpxchg(). > > > > Signed-off-by: Jeff Layton > > --- > > include/linux/timekeeping.h | 4 +++ > > kernel/time/timekeeping.c | 81 +++++++++++++++++++++++++++++++++++++= ++++++++ > > 2 files changed, 85 insertions(+) > > > > diff --git a/include/linux/timekeeping.h b/include/linux/timekeeping.h > > index fc12a9ba2c88..cf2293158c65 100644 > > --- a/include/linux/timekeeping.h > > +++ b/include/linux/timekeeping.h > > @@ -45,6 +45,10 @@ extern void ktime_get_real_ts64(struct timespec64 *t= v); > > extern void ktime_get_coarse_ts64(struct timespec64 *ts); > > extern void ktime_get_coarse_real_ts64(struct timespec64 *ts); > > > > +/* Multigrain timestamp interfaces */ > > +extern u64 ktime_get_coarse_real_ts64_mg(struct timespec64 *ts); > > +extern void ktime_get_real_ts64_mg(struct timespec64 *ts, u64 cookie); > > + > > void getboottime64(struct timespec64 *ts); > > > > /* > > diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c > > index 5391e4167d60..ee11006a224f 100644 > > --- a/kernel/time/timekeeping.c > > +++ b/kernel/time/timekeeping.c > > @@ -114,6 +114,13 @@ static struct tk_fast tk_fast_raw ____cacheline_a= ligned =3D { > > .base[1] =3D FAST_TK_INIT, > > }; > > > > +/* > > + * This represents the latest fine-grained time that we have handed ou= t as a > > + * timestamp on the system. Tracked as a monotonic ktime_t, and conver= ted to the > > + * realtime clock on an as-needed basis. > > + */ > > +static __cacheline_aligned_in_smp atomic64_t mg_floor; > > + > > static inline void tk_normalize_xtime(struct timekeeper *tk) > > { > > while (tk->tkr_mono.xtime_nsec >=3D ((u64)NSEC_PER_SEC << tk->t= kr_mono.shift)) { > > @@ -2394,6 +2401,80 @@ void ktime_get_coarse_real_ts64(struct timespec6= 4 *ts) > > } > > EXPORT_SYMBOL(ktime_get_coarse_real_ts64); > > > > +/** > > + * ktime_get_coarse_real_ts64_mg - get later of coarse grained time or= floor > > + * @ts: timespec64 to be filled > > + * > > + * Adjust floor to realtime and compare it to the coarse time. Fill > > + * @ts with the latest one. Returns opaque cookie suitable for passing > > + * to ktime_get_real_ts64_mg(). > > + */ > > +u64 ktime_get_coarse_real_ts64_mg(struct timespec64 *ts) > > +{ > > + struct timekeeper *tk =3D &tk_core.timekeeper; > > + u64 floor =3D atomic64_read(&mg_floor); > > + ktime_t f_real, offset, coarse; > > + unsigned int seq; > > + > > + WARN_ON(timekeeping_suspended); > > + > > + do { > > + seq =3D read_seqcount_begin(&tk_core.seq); > > + *ts =3D tk_xtime(tk); > > + offset =3D *offsets[TK_OFFS_REAL]; > > + } while (read_seqcount_retry(&tk_core.seq, seq)); > > + > > + coarse =3D timespec64_to_ktime(*ts); > > + f_real =3D ktime_add(floor, offset); > > + if (ktime_after(f_real, coarse)) > > + *ts =3D ktime_to_timespec64(f_real); > > + return floor; > > +} > > +EXPORT_SYMBOL_GPL(ktime_get_coarse_real_ts64_mg); > > + > > +/** > > + * ktime_get_real_ts64_mg - attempt to update floor value and return r= esult > > + * @ts: pointer to the timespec to be set > > + * @cookie: opaque cookie from earlier call to ktime_get_coarse_rea= l_ts64_mg() > > + * > > + * Get a current monotonic fine-grained time value and attempt to swap > > + * it into the floor using @cookie as the "old" value. @ts will be > > + * filled with the resulting floor value, regardless of the outcome of > > + * the swap. > > I'd add more detail here to clarify that this can return a coarse > floor value if the cookie is stale. Additionally, for these two new interfaces, since they are so specifically tuned to this particular need in the vfs, it might be good to add a comments in the kerneldoc here that they are special case interfaces for the vfs and should be avoided outside that space. That probably would alleviate my main worries, and we can polish the details around cookie or no cookie later if needed. thanks -john