From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id EDF0CC30658 for ; Tue, 2 Jul 2024 18:42:08 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 7CC0F6B0093; Tue, 2 Jul 2024 14:42:08 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 77B3E6B0096; Tue, 2 Jul 2024 14:42:08 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 61B976B0098; Tue, 2 Jul 2024 14:42:08 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 437396B0093 for ; Tue, 2 Jul 2024 14:42:08 -0400 (EDT) Received: from smtpin15.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id B1FF716049A for ; Tue, 2 Jul 2024 18:42:07 +0000 (UTC) X-FDA: 82295682294.15.DAD3124 Received: from mail-lj1-f173.google.com (mail-lj1-f173.google.com [209.85.208.173]) by imf06.hostedemail.com (Postfix) with ESMTP id 7D69218001B for ; Tue, 2 Jul 2024 18:42:05 +0000 (UTC) Authentication-Results: imf06.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=google header.b=IW2adInL; spf=pass (imf06.hostedemail.com: domain of torvalds@linuxfoundation.org designates 209.85.208.173 as permitted sender) smtp.mailfrom=torvalds@linuxfoundation.org; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1719945702; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=PGTXXsYfe5Sa08dwRDYVcy0L1nqIjL5SVkaBtsUXzro=; b=CEJ/MS1qe1A/xMWV7ScfMVDFJRtvQHwAQITeBvs6DGAx8mLW6k4h2HmvZHoiaeigkmj3uc 2fJXRetmUv1I9n1/gl9W/ijNl8yaoT3RttzKeO/vP+tKqR7jgX5qbJ0fXFgpsQUMnTUyYe j60a2fOY3+b27LRe/Mt/6dISOTXKbwY= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1719945702; a=rsa-sha256; cv=none; b=a7dVfT6kwljYxhmPTV5vfb/0Ljh2PGKeWP2sxTVoPxqywsG/hsZMTDVsxOjzEIS4PiPtHB aaPhWzCoNGHBlgFUiDYdLS5IsrAcLCPPnzFpon8DSNF4KVe8vsNVpEfSlZlhsVgv2tW8eZ 8mZ+rl04MlHkU3+R2zL84/sjvFVTng8= ARC-Authentication-Results: i=1; imf06.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=google header.b=IW2adInL; spf=pass (imf06.hostedemail.com: domain of torvalds@linuxfoundation.org designates 209.85.208.173 as permitted sender) smtp.mailfrom=torvalds@linuxfoundation.org; dmarc=none Received: by mail-lj1-f173.google.com with SMTP id 38308e7fff4ca-2eaae2a6dc1so60281611fa.0 for ; Tue, 02 Jul 2024 11:42:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux-foundation.org; s=google; t=1719945723; x=1720550523; darn=kvack.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=PGTXXsYfe5Sa08dwRDYVcy0L1nqIjL5SVkaBtsUXzro=; b=IW2adInLtyfanMz4NhvSBm1EdLLURaGgUvMsaCXdM0Gy6mVIXyQElvxYDbxHd191i1 drGp10J9xMwexYtsoNPxCPBpqQYI1qE1hkwOwgBaBjZBtK8CET2RoNyP7MoE+xdUTCmt X2Z8h23oS/0jyfGdmdKEr9gxV59cnfoQV9nFY= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1719945723; x=1720550523; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=PGTXXsYfe5Sa08dwRDYVcy0L1nqIjL5SVkaBtsUXzro=; b=HJYBYsU8pT2oQ9hDuKjJbn/uxiXmZSFC/TOcoZZxTQ1PpTdhatQGQPH6xeH+zDiiSt TB9gEDP+7mbf3AoigBFEW/pD2l+VGNMLtsYgS5ZielNUAQxS1cwoon/CFm6Gw4AcezFA pttMFaGNnCHfr1QcRaNKy6YcoWQn5fZndoFto9/NL2IT24v6jHtEcL6CgZ+oBiZUFHZL piQPHCtt5C/uD3dpqYCMDo0a7NpBbVNT9SsGPTChPSv9COiFlUoSrncWVQZOR+eHq5ao jJetrPvdY3vA1VDiHWTXDylG8v0nD0ZoVP7jhwHPMhQuK63SN8UMiL8l6WAtaaHV8PFu 4T6g== X-Forwarded-Encrypted: i=1; AJvYcCXL7Eqy+l82cMLJsD6g1XfO0D9EB68dHfpB20k6dox6WRewb7P+5cB7V+/MjXFv+udcwV5JUYSD95O2Hm293kYRqXw= X-Gm-Message-State: AOJu0YwQcX+rQ6setnELFeDKtpbmC79BawKCsD2QJ3O2FJ94hLO/1/kc n0AD3fvEc7v2ESQOoICAkppILXI12uNBRggKLI2OtqKNDGrS2bol1UgoAJPZtyeJHdXmubqwI7U 6r53A2g== X-Google-Smtp-Source: AGHT+IGtP+yzri3vh1Dai5TB7ryhuNsC4zrUJ3nJxhwQ4od9/xWUc2fwPP2+IVW8Q5+Uqmvnc70bfA== X-Received: by 2002:a05:651c:2223:b0:2ee:4d37:91df with SMTP id 38308e7fff4ca-2ee5e3bb7fdmr94072731fa.27.1719945723207; Tue, 02 Jul 2024 11:42:03 -0700 (PDT) Received: from mail-lj1-f182.google.com (mail-lj1-f182.google.com. [209.85.208.182]) by smtp.gmail.com with ESMTPSA id 38308e7fff4ca-2ee7aad80b9sm1017941fa.78.2024.07.02.11.42.02 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 02 Jul 2024 11:42:02 -0700 (PDT) Received: by mail-lj1-f182.google.com with SMTP id 38308e7fff4ca-2ee75ffce77so13735091fa.3 for ; Tue, 02 Jul 2024 11:42:02 -0700 (PDT) X-Forwarded-Encrypted: i=1; AJvYcCUMCMYk7ND3OzTGZexanFrgv8qcYuByUhroNd0VecT60wsBwF14TS9+rvaX5llcFsDJXp6EoWQKEVa4B4RL1cpy7LI= X-Received: by 2002:a05:651c:b14:b0:2ec:5488:cc9e with SMTP id 38308e7fff4ca-2ee5e3bbd14mr77301241fa.26.1719945721844; Tue, 02 Jul 2024 11:42:01 -0700 (PDT) MIME-Version: 1.0 References: <202406270912.633e6c61-oliver.sang@intel.com> In-Reply-To: From: Linus Torvalds Date: Tue, 2 Jul 2024 11:41:44 -0700 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [linux-next:master] [lockref] d042dae6ad: unixbench.throughput -33.7% regression To: Mateusz Guzik Cc: Christian Brauner , kernel test robot , oe-lkp@lists.linux.dev, lkp@intel.com, Linux Memory Management List , linux-kernel@vger.kernel.org, ying.huang@intel.com, feng.tang@intel.com, fengwei.yin@intel.com Content-Type: text/plain; charset="UTF-8" X-Stat-Signature: nebbocm9nkdak4i855thuu7wsgfhj85m X-Rspamd-Queue-Id: 7D69218001B X-Rspam-User: X-Rspamd-Server: rspam08 X-HE-Tag: 1719945725-47652 X-HE-Meta: U2FsdGVkX1+x1RpvtOXvrGwiw/qx/OJOw2xhYTkKY5I2fe29UAMhO1BahASRmhXDVNtivzP79MNYMAFdz6q2lI8Tvx8LzDrGa/rfSyycCtZLi37M1GaMm3HZPRF4dKbs3b5YAm3gK1UpfnhLs/g7rEv25iUzg3TYystD4lmJK0eMmVxdUZsW9ez7ukcsrF2AgKYBKUWhKEWumqr7neh4C4FcYIoX/zbKQkS4SXFkPsIKpt1IT8YMlgeqqAQ8rrQHOgyG9Dz86orLAumMrkQXvjFeugWQMjS8il96bAPOKt1SKcHBYuqx0uYIGxFbLMdcp0cdZ1Sg+LWsj6eE1rBXyRA2hMAFSxgFu2s+ZjKZuEv5OvaLA3MhyQMoOYoFNfk6269yAGQDSvde80cKYwx28H8Hi9rixEUe9TQg4LSVrVos5Pq9ZFMR8TiqTBTC/+/lzm8k4UKZeMpUdSeqJEvB5WiQcp6IU4jD3c1rSB1tquk6z3CwB9sxdzKGSHIDquwpe5NX4GVHLGS7kyXSKCMO0oNDMbMg6+X8Rb30K+qenv1FozzHutbUPLuWkrh/buoP0j5MHIRdSl2tWf1f7iMhRmLtymq3KaITdW0+Oc1ESglcbuYIrO2xTAnPIqdApYdSAHeGJO8kDaJSsdeSYvO5NcCRLgFpmnZZYV8piYICICl1uKFVUydn3wOnQHo6NpuHo6OCMt5dLi5hZxHvBCtNGci+993JanQMHat14pFm5M9R+P1eROu+nsSf78Wp/xUq06xFwXYFkDF+wIEvytXogmwSKglUdsRYInLAC42MYMVZ5PrHR6t9n72Amuvt18q3SsyeJRMpimCG5SV9n+t+IuHCAl9zYnx4KqRO0cLKelRgCBX4ZPNEv1xzzqjcweID7fNpT0ucHNqcnaTKFieJF9Og8rkZZ9VSXeUu+mRacZ6VaeX99ZkmTOkF7Bq1+npFGluTWKscZaoMtJz3GLO JkADwgNq dJKItbytlHJSOBOypGOPgjYDBChuHaGGTMSF/j9UEtiRfVLtbMlBqUxWCMgZzVU37bXaLRMn/dqlbPfjjeblxAZuvs2kJlDzTYjuWUcBkbhkHofW9rP3zAVIW0sUkCVE+dHv5ZlqCZIlC0SECJPzxCeyQlackU5LeDQw2Ui2hZDmUF582r8h4iD/tpw4v61VYioMgDgTtsZDYHqoUpLOjDr3h/Bp3iEuYaomemdhQcaY4F6574J6zcoQijJlzTqzSQfMD4yWezLGhuIRsmUe5RzpxwfMXYfO6h8ZhyTqHcl5IdK5JkDWdp2PgSnBn3j45OyzTPdd6NkHAMVW2Fe3WHf3z/tDFjXD6yP0V0dYmvg4AhPPY9UpM6zbm202+wIM1FsmmbI3mDFQJOJRhnKPoV3jqcKDeKVeAStz6fm/z+Q9agr1LkjThmvIP0A== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, 2 Jul 2024 at 10:58, Mateusz Guzik wrote: > > Suppose the rcu fast path lookup reads the dentry seqc, then does all > the legitimize_mnt and other work. Everything, except modifying the > lockref. The caller is given a mnt to put (per-cpu scalable), dentry > seqc read before any of the path validation and an indication this is > rcu. Yes. > Then after whatever is done if the seqc still matches this is the same > as if there was lockref get/put around it. So this is partly why I was thinking of a callback. That "check sequence number afterwards" is still important. And if it's a callback, it can be done in the path walking code, and it can go on and say "oh, I'll need to redo this without RCU". If it's a "we returned a dentry under RCU", suddenly the caller has to know this about the name lookup and do the repeating by hand. And as long as we don't expose it to modules and only use it for "stat()" and friends, I'm ok with it, but I'm just saying that it's all a bit scary. > The only worry is pointers suddenly going NULL or similar as > dentry/inode is looked at. To be worked out on per-syscall basis. We have subtle rules wrt dentry->d_inode. It can indeed become NULL at any time during the RCU walk, since what protects it is the d_lock and the dentry count. The inode itself is then RCU-free'd, so it will *exist*, but you can't just blindly use dentry->d_inode itself while under RCU. Which is why it's cached in 'struct nameidata', and we validate it with nd->seq when it's loaded. And why things like may_lookup() use nd->inode, not the dentry. And that's another rule that we probably should aim to not have escape from the path walking as an interface. Because it's much too easy to do struct inode *inode = d_backing_inode(path->dentry); but that's just wrong during the RCU path walk. Again, having this be a callback during the walk would avoid issues like this. The callback can just pass in the separate inode pointer. And then a sequence point failure will return -ECHILD and do the walk again, while a callback success with all the sequence numbers matching would return -ECALLBACK or whatever, so that the caller would know "the stat information was already successfully completed by the callback". Anyway, that was my handwavy "this is why I was thinking of a callback" thing. But it's also an example of just how nasty and subtle this all is. But I'm convinced this is all eminently *solvable*. There's nothing fundamental here. Just a lot of small nasty details. Linus