From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id BBDB6C30658 for ; Tue, 2 Jul 2024 17:58:18 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 4C1DF6B009D; Tue, 2 Jul 2024 13:58:18 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 44AD26B009F; Tue, 2 Jul 2024 13:58:18 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2C4986B00A0; Tue, 2 Jul 2024 13:58:18 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 0AFD16B009D for ; Tue, 2 Jul 2024 13:58:18 -0400 (EDT) Received: from smtpin23.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id A2AD78041E for ; Tue, 2 Jul 2024 17:58:17 +0000 (UTC) X-FDA: 82295571834.23.B121AC6 Received: from mail-ej1-f51.google.com (mail-ej1-f51.google.com [209.85.218.51]) by imf28.hostedemail.com (Postfix) with ESMTP id BFAA0C0013 for ; Tue, 2 Jul 2024 17:58:15 +0000 (UTC) Authentication-Results: imf28.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=Yict5l2w; spf=pass (imf28.hostedemail.com: domain of mjguzik@gmail.com designates 209.85.218.51 as permitted sender) smtp.mailfrom=mjguzik@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1719943085; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=6GV0ZhDNGPCuwYUPJDb43qoIpqlccrEfWt2pjGeAkCk=; b=ZJEn18Ow9YTwHSAYLeNerXayMQq+s89Wpu8SvtHn6f76PKIIUQkxv5J5lJ/IVbpOAdQg2d ikQLaQLJ7IJFH2AhOAZ+m1KKllXwP3DR0PRrHzgrbNb3jzsE3UbbaCT9Frvum4pRZqm5XA GA5bErjFmoXn2PZuxSurNP6iyj2KxPc= ARC-Authentication-Results: i=1; imf28.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=Yict5l2w; spf=pass (imf28.hostedemail.com: domain of mjguzik@gmail.com designates 209.85.218.51 as permitted sender) smtp.mailfrom=mjguzik@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1719943085; a=rsa-sha256; cv=none; b=clocLxsLjgg5PRv4Gfe3r3FhqnoFBWYhN1MDV2DqlSzd/eLBs24byGrjKayGfFj0PW3tEn eXiMjMe5lfcqQW+A9hhEHpzGIPE5VY0iIup44pZJzcAc/PG+44NKEunNNOn6X5fQ4wjVzQ H/5hgU+HaC7A++64wp6Vg3mbU9ec/lc= Received: by mail-ej1-f51.google.com with SMTP id a640c23a62f3a-a732cb4ea31so601952866b.0 for ; Tue, 02 Jul 2024 10:58:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1719943094; x=1720547894; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=6GV0ZhDNGPCuwYUPJDb43qoIpqlccrEfWt2pjGeAkCk=; b=Yict5l2wg9fPmwPluJ+d+ToK7HN9zjynJ6KayOG03a7OEuRqcPq+Wh83V1kjyvT3tP QWbbOn9agl7JY+hQDMTT0Zz4XybWdLxZWY+euSQ2YUInwubMi+n0MNBXg8GbJ5oPR9b3 ++d317qoVBi90sUf0706LJ7jgCH/TjCN/AfqTuWBP38K0ef/5C4NT37k90cfZ++OUSwN WH9Ae5WXNU4VOpNpD6whs9ylAB1QYkanj+NZ7q+iypc04P9JzFg5H+NkZYFgmTZACfuX QtrQA3sRqq9dOgrWKS3VkWW5uF1XP23zV6VvD9HhaWdgjzWUfvbiRC12F+kJ37yXDdyV iVpg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1719943094; x=1720547894; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=6GV0ZhDNGPCuwYUPJDb43qoIpqlccrEfWt2pjGeAkCk=; b=E0qgc/G1GteKizD7ONTA+YhggnAzcU22i6ik4g4RPe435DRgY3jYL+3eWvc9eRUVBu HJ6so9rmxY7L09akEG4VgrKVBRX3owzQbQFQkLwNg426xRi3oVCSlHskRKSaLy/WGtFr uIvInZCE6LYPpGaTbBOhFszj11ZCYf1eYTcx8xTfQb082B/WkhzxsHUrf0BZtSai5NjC ILmNgz10iEHGEKd4syLLfjVMRLs0XCmESmEYtndy9tIIeDr5M2OOMgrvIm7rIA8R+JVW H3vVNM2tKWcBws2z7RrZoq68/XV0d+5Ng++NuxUYG2RiqV1uUjHxCs5aUk1504JOVU+L ih0w== X-Forwarded-Encrypted: i=1; AJvYcCUJJ+rykx0ExjW+5riTOA74tNn0PLwuMVOYJ79H3dCSX7e+LsYRPjUL9hzCdaMYTkU+rV/kJwx3bQN4tGR5YGW0A9c= X-Gm-Message-State: AOJu0YzWqv46bXxzKuWmmggIJ5DuQd+3IEKIS6GizbSD3/B/BvAMsR/S y+EL9UVzX5CvKqqZGag8RItabSmNiax0dUFkpE/W1ZLlo6oTzun42EH+27mmt8S3wxOqP3q7eKE jqjcWSikpZV3DlFODAQMugM8Vdxo= X-Google-Smtp-Source: AGHT+IHI8nHlXGtLgZucq/ZQPSk2r2VMZ8lQcnPCPPR3PyRgxYqNVt/2Qu+XeyFvQ5tUdULGlIastBJNp96WssLMxB8= X-Received: by 2002:a17:906:5acb:b0:a6f:c24a:721e with SMTP id a640c23a62f3a-a7513935e17mr752283766b.30.1719943094018; Tue, 02 Jul 2024 10:58:14 -0700 (PDT) MIME-Version: 1.0 References: <202406270912.633e6c61-oliver.sang@intel.com> In-Reply-To: From: Mateusz Guzik Date: Tue, 2 Jul 2024 19:58:02 +0200 Message-ID: Subject: Re: [linux-next:master] [lockref] d042dae6ad: unixbench.throughput -33.7% regression To: Linus Torvalds Cc: Christian Brauner , kernel test robot , oe-lkp@lists.linux.dev, lkp@intel.com, Linux Memory Management List , linux-kernel@vger.kernel.org, ying.huang@intel.com, feng.tang@intel.com, fengwei.yin@intel.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: BFAA0C0013 X-Stat-Signature: a7j1opgsd7k35xpmzzautg7qakr6z7cc X-HE-Tag: 1719943095-471411 X-HE-Meta: U2FsdGVkX1+fbu+5cNVYCycQ62F/+bC6go+5xQD+TL3M4yMx6cCwZ1S9zwlami8JXMAa6Z9goewzipnHTdRXuznAcSOo/Vcs95R5OBnWR0KltbGSISm35jAB0QiSj0AYAjXhoDBN6qOlm7USG+ahTMksZrmJzvWZvbrzXVWtETFds6syI3waTekwTjJcSRUjDeIEOkP3OPPiCAYnQTEzoS28PbXpn497hHfWMhCaBZc7legzVCWjcbcCh5EvAZxh6yiBfEqgx5iF5a5ENonE3i0/4zFoyFZxaVVrqLYjNbioowRyoNn2X7cRzuavvWAfjGSOk7wBflfFdKxAfdepOGaEZmfVHfXNZOncbf2jY7nLvqGMamD7zMFfNT3vpP/U4xoiO92DHGmZqfmURq254JwyKBhgfvXlWFa41Sn4I2LwmmC0dzg9lAo7gr06Qls2CuFMPtnSvd9xBiGBdKNApW5hSMuKJv7V4B0JsdOm1mKtJEHQKPTdblQwjOpFRyv3q5B6TjUxJRqw7VWMDd8Od8fM4Nk0kkUqnPH6o9JD71qrUHmUvwT7oF2mMZYJtBfmXlWGnhwc5h9bjuj+YzhxpnBGtxp2RG2fc1ww7bPYhFlLgxB9fRxCbAfoazl66tMbjPGNhFrYVphgJf5iJQCc2q6DeHzzN+g7vNu5yOCKrQI1PplOUmzAH9WWVjQq+t+v4Iq4tvuPwCZV9MjlmVlBOL7qZ5VDc73FCzoZx9B55dePjkA75ITvMrHWVRlhg6rIEPMAri9HeP09IfMQTf71xKlL3Qn2XtI/0FbF7QOKVe3dlRCWDGEQeMGSCsXZ/5POFHrfgHaXjePsocd87ONNRVRYv8dWi5FnpRBoZTG2wfIh9YoY7Ji8SBMnapEsIJmTL6stosWHuCB8EvqBq/fVtE1OE4E9GYkHTxptjqI5ovxPYtTzxjI6UxAAyNl2QFMYKaozmB+jJ/+/Z26gnTA 5USU6FSM mDZ/V33P7kQPQIBSUkahbpULLjojDUe/K/yfKSksKvDo2AmdAnuhxl6ERvnIYW7i4h9LqtshwKLQMjYxpJu9xBzuNiSDm61EpEHAk/naMcdaH/s3jG9GJp05ZhqGR1colOMRO1mad2jufspTQ6ODNYAsb+2C1wxm9RhCsoen+I0mxjTb+XSm+DNZmRfhTNko1s54JPjtTTKG3bgFPc7Nn1IlBya5b5avNnEB5AGR4XzW/qWGcjk4KOC+QJYKr1Pd9o0ZxfJyrBFsHcP1hIFXPkq6VQFZwij1fcJ385JkH3fY83/YZlcvfmFRvmGKSg3lNuZXJNsGviREYFW7v0Gdsqu53rREhYhzD1RJQ44onSgJMIqpLGGXA1Zm/5HXTLWaQPlZd X-Bogosity: Ham, tests=bogofilter, spamicity=0.000002, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, Jul 2, 2024 at 7:46=E2=80=AFPM Mateusz Guzik wr= ote: > > On Tue, Jul 2, 2024 at 7:28=E2=80=AFPM Linus Torvalds > wrote: > > > > On Tue, 2 Jul 2024 at 10:03, Mateusz Guzik wrote: > > > > > > I was thinking a different approach. > > > > > > A lookup variant which resolves everything and returns the dentry + a= n > > > information whether this is rcu mode. > > > > That would work equally. > > > > But the end result ends up being very similar: you need to hook into > > that final complete_walk() -> try_to_unlazy() -> legitimize_path() and > > check a flag whether you actually then do "get_lockref_or_dead()" or > > not. > > > > Ye, the magic routine to validate if you can pretend the ref was taken > would wrap it. > > > It really *shouldn't* be too bad, but this is just so subtle code that > > it just takes a lot of care. Even if the patch itself ends up not > > necessarily being very large. > > > > As mentioned, I've looked at it, but it always ended up being _just_ > > scary enough that I never really started doing it. > > > > I implemented something like this as a demo in FreeBSD few years back, > it did not blow up at least. The work did not get committed though > because I could not be arsed to productize it. > > tbf if anything the only shady things here that I see is that stat et > al do their work without any locks held nor seqc verification in > current kernel. > > In FreeBSD this was operating directly in vnodes (here one can pretend > it's inodes). In that system I added sequence counters to the vnode > itself and any state change like write, setattr, unlink or whatever > would bump it. Then something like stat could safely read whatever it > wants in a lockless manner with the final check for maching seqc > indicating nothing changed. > > Not having a "someone is messing with the inode" indicator (only with > a dentry) in Linux is definitely worrisome when pushing RCU further, > if that's what you meant. > > Again, I'm going to poke around if only for kicks when I find the time > and we will see what happens. Suppose the rcu fast path lookup reads the dentry seqc, then does all the legitimize_mnt and other work. Everything, except modifying the lockref. The caller is given a mnt to put (per-cpu scalable), dentry seqc read before any of the path validation and an indication this is rcu. Then after whatever is done if the seqc still matches this is the same as if there was lockref get/put around it. The only worry is pointers suddenly going NULL or similar as dentry/inode is looked at. To be worked out on per-syscall basis. Unless I'm missing something. --=20 Mateusz Guzik