From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 704EDC30658 for ; Tue, 2 Jul 2024 17:03:14 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E0BF86B009C; Tue, 2 Jul 2024 13:03:13 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id DBB966B009D; Tue, 2 Jul 2024 13:03:13 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C83026B00A0; Tue, 2 Jul 2024 13:03:13 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id A6C066B009C for ; Tue, 2 Jul 2024 13:03:13 -0400 (EDT) Received: from smtpin04.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 4E9231A0443 for ; Tue, 2 Jul 2024 17:03:13 +0000 (UTC) X-FDA: 82295433066.04.2EE6587 Received: from mail-ed1-f48.google.com (mail-ed1-f48.google.com [209.85.208.48]) by imf22.hostedemail.com (Postfix) with ESMTP id 4B1B5C0022 for ; Tue, 2 Jul 2024 17:03:11 +0000 (UTC) Authentication-Results: imf22.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=exTX1gQT; spf=pass (imf22.hostedemail.com: domain of mjguzik@gmail.com designates 209.85.208.48 as permitted sender) smtp.mailfrom=mjguzik@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1719939769; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=SJjrYGYHHwZ61PnvF9q5/p7fpSWQxsDabX0KAL9mp38=; b=E9pG/zo3OQ7hhZ0iDwdh6BRL7Adcx04CfIQqasBXysZMhwHkSQ7zIVpnd6K8DZvGIPMTzD Kz/VF/IcbVX2avNcW8rcY1yWOg5fk7pR9Hs1sJFoLaTbsT2F1hurvz0r/3pM7MI8YHKPNW kfIIAiu5ag68Ym89i5le9uymffAfQAw= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1719939769; a=rsa-sha256; cv=none; b=dcoTPU4ynrKgp9gdihJWZJmg1TVzHBfwOt2e60uJQaH0XM8BoH94WlBmQezSeQPXXH2Orl 5aCRvh5eWi45KxbJ57GB0XJYT7qOh499IbhaLo6uVTpgM9wYRNBm/l3tZxSY8q/EV5VNWE xJL+xjA8CySlrbFiRNaPE3UH1XdhOSw= ARC-Authentication-Results: i=1; imf22.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=exTX1gQT; spf=pass (imf22.hostedemail.com: domain of mjguzik@gmail.com designates 209.85.208.48 as permitted sender) smtp.mailfrom=mjguzik@gmail.com; dmarc=pass (policy=none) header.from=gmail.com Received: by mail-ed1-f48.google.com with SMTP id 4fb4d7f45d1cf-58b966b41fbso1243276a12.1 for ; Tue, 02 Jul 2024 10:03:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1719939789; x=1720544589; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=SJjrYGYHHwZ61PnvF9q5/p7fpSWQxsDabX0KAL9mp38=; b=exTX1gQTkQKT4P0I1ie1xH66US8pwoTIlNy++Msxk3Imm7fTyyVELUQIlXzq+FWz3z 5PNYtn3Q3kve5vnaBEAtI/CCqXjrZVc1kOD5BMdrFHiBFvxsEzlezkbbaPl0uYT1ydK2 GJB1UYX6/2fIhfMML91BtR33u7CInjFCtPDcf0ArMLTRYGUnU43r6DD6XWY28b8spCDg Cgb3yiHuaxOU1PXdnnB2of+nMYV6ubOIW9QnrN2sIo7iACkq5CIGFO8wpmfCi1Gap5XE 6ydYLSatipN7No9fuSB4Ci96dkzn56msV0bP2wkJLiYKPFK2JyvcETKwhkVZwxOFyppL Ijmg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1719939789; x=1720544589; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=SJjrYGYHHwZ61PnvF9q5/p7fpSWQxsDabX0KAL9mp38=; b=o53bXbbtsa1JTgG1pigqVhuOv0WZ6EbeMuXDMHMf20ihQCPxTOIk2/mzLAY+1kLGgZ xht+WsQsQVb757+B5ypLNFlJmK/Ett8yC675/yrE3fapXhMTy07TgASXOZpbTDk8DZFw 2jBOqwAKGLx331XGxp3vs6B+MTonMqokNeN3jigyCsnBYIyB9ZVveTfHu/tzdvxfi3PB aCXxLISgV+MZBcw7X3eEzhbojkPWF4hadAGt8kQfNGwzYOZZDFgaF5jJ2hvpuadsjAhj MIdb6DS9FqJReRwwQbtcqItjHUIZHn+eyEsePJXMoMECa7E6BY0WE9P+iNdwIhvSuR26 ioTA== X-Forwarded-Encrypted: i=1; AJvYcCXbpBLmB+IhxUaMi4Vuw4yASvMlbFPv7hdjO9OjKRI3EB+vUNbLIVe0A5/zpRxntBcc5kHPzNG+YPhIVGgz+lKNRbQ= X-Gm-Message-State: AOJu0YzEbKcjvVwQZHecB4CpdpVDPeq4x82jVVawPcG7aUF4FnN0V/UW 5vwXJ6WC2CWOVvJNVBlRmqoXCFqhBBTXK8QmvzLCx1Zcy9Gk9dn8pdi9hmHQcPpyEdzRSEkAo5J JRRndJU7MvHUMKZd0+2cBDpihK3w= X-Google-Smtp-Source: AGHT+IGXNQd+0tJLmUXddHIo1j/DHEEZvybJUyBGUresmYG9sx1mdFAY7nd0ymNHYdn+8GtUly3OiEWyux7bo0SKJ2Q= X-Received: by 2002:a17:906:3183:b0:a75:1069:5b94 with SMTP id a640c23a62f3a-a75143e700emr545024766b.21.1719939789328; Tue, 02 Jul 2024 10:03:09 -0700 (PDT) MIME-Version: 1.0 References: <202406270912.633e6c61-oliver.sang@intel.com> In-Reply-To: From: Mateusz Guzik Date: Tue, 2 Jul 2024 19:02:56 +0200 Message-ID: Subject: Re: [linux-next:master] [lockref] d042dae6ad: unixbench.throughput -33.7% regression To: Linus Torvalds Cc: Christian Brauner , kernel test robot , oe-lkp@lists.linux.dev, lkp@intel.com, Linux Memory Management List , linux-kernel@vger.kernel.org, ying.huang@intel.com, feng.tang@intel.com, fengwei.yin@intel.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 4B1B5C0022 X-Stat-Signature: rxjk4fto1uymnz55eekh148gac9fujfn X-Rspamd-Server: rspam09 X-Rspam-User: X-HE-Tag: 1719939791-645161 X-HE-Meta: U2FsdGVkX1/qua6bJw0sStLjHral2WQMU3HG4k7M99+X0UG9OQcWIUJcT95Dqg7DLsv6wORQ0y2mzuH3XCro8Br9HHcBQfORsTEXmakJ2NwawF4pL+dQErEIXs0y8x8ShCYStBiZgVLGdexp4jbJlimbyKiGUGg/2QDISP+j597zf0TuSgoHJGFO0lZCz+ej6DWaqJa1Ez/oo9R3ZJFFEex5FCs1iadIuSBFzRJMh+kCCgfvoBtHzmWs6hzJnuhUKth9UfXAd9nDfcdkDC9sMCxo+S9HUgWq370j04YbvUgQW7z+8Mbzndkq5zDUYG+27LttJO3UXCKHv6C5kMgcpBDPKjaLf2GtmYwcz9T++a5cHxqgGur7QgdafuCH8dFHjdna7i9tATjeeyW5xI97vQwG3peTcbSBF53m8efzOS6Dv8G1Yg8nunVoEeLAvpa8jfYiF0ZcEXE0Q3+jiUeaEUsurFrUnRvR88fVr6wSjL56poaWebgDDEWqM06m+jSaBQZgKzNUz45V88KkHmWdUSbGNhx3T5d1+eKsVOBsOH56gTcZWjoUeWVvy/njW3P0G7JdpNfPYArnX1qAMdALQ7s0vxuhEVzXs7EpgwbNxvdcN+ACU4l2mMMosc/EPdp7jiXDtGV2MUr9gzxVYW1DstDrO8RHnM1EfWS3joJ78D40V98yUranY2nMAiqQNxEThUM9P1wS/dSfigZuJjLpPxoH69JGYUw8wULnECOIdAVE2kC0TbZufGDdSP/Brn7ZJt0OxmxizK7/+jq0Dn62w1ofE5neRxjd5aShSdY/AFvFTY44Bw/rZcc6BOoCw33IsFienbz7TBuFvJ72Iacgi6m4WVdachCMSADlrDH6XbK+TWcaPrvqF8Xy0DOJNGlLvHA66VPWBggChuJr08lFYUypwYs28r0KrTuDeYCDydYV7gDP5CJawUjQZ4e4jSOiBMfaEAeWTg72FN3rNs9 +zMjP/yR nmjmlPpA5C0Bkz2c9uCnABNWPCydVGqMZ2QjZa8TihsmCINtp2fbPOgmkAq7l6qng8zQ0u60KYztk6tYWSsVnS+vlVqOG8xbeWI9p6oBABoI964+2f7VpEDoBbp39ARoYq6Ub2WwsYTBZ4AtlgfOvjCZq89h3JDT2p7/bvnjGKolX2KNDJFs0ksRU/UDdDfGeDX1U4heujQf9unfu2b/is5wSsFKmZb9l7s1cJa0raQttxg24fjZJGa1R9ChA/IetMe2xNiMIo3Z6jBUVouZc0nGtqHqEtStGiCdKyS82doxNakThDbejPzAFp7zvYKlaMnYuKHlFmgFlojbow+4ZjdmpmCeetxrgBP9s8RCWxCq3EvpsNI1YdlHR0OZ2ZIBYcEiz X-Bogosity: Ham, tests=bogofilter, spamicity=0.000008, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, Jul 2, 2024 at 6:47=E2=80=AFPM Linus Torvalds wrote: > > On Tue, 2 Jul 2024 at 05:10, Mateusz Guzik wrote: > > > > Well there is also the option of going full RCU in the fast path, which > > I mentioned last time around lockref. > > > > This would be applicable at least for stat, fstatx, readlink and access > > syscalls. > > Yes. That would be the optimal thing - have some "don't take a lockref > on the last component at all, because we will finish the use of it > under RCU". > > I looked at that some time ago, and it didn't look _horrendous_ from a > conceptual standpoint, but the details just got to be nasty. > > What I wanted to do was to hook into the "we're still in RCU mode" > with a callback that stat could set. > > And we'd call it at complete_walk() -> try_to_unlazy() -> > legitimize_path() time just before we do that lockref_get_not_dead() > thing. > > So then the path walkers that are ok with RCU state (ie mostly just > 'stat()' and friends) could set that callback, and get a callback > while the path walk is still in RCU mode, and could fill in the stat > data then and say "I'm done" and we'd never actually finalize the path > at all, and never do the final lockref_get_not_dead(). > > Sounds simple in theory. And then when I looked at doing the actual > code patch, I ended up just running away scared. I was thinking a different approach. A lookup variant which resolves everything and returns the dentry + an information whether this is rcu mode. if not the regular handling + path_put sort it out. If yes then the fast path handling gets involved. If a filesystem can provide a custom callback for the regular usage above, there would be an optional callback for rcu mode as well (and it would be illegal to only have one). Should this run into any trouble it can return -AGAIN at which point try_to_get_actual_full_ref() (but better named) routine is called and it tries to get the actual ref. Suppose the callback or in-place handling worked out. Then a routine to validate nothing changed (at least dentry seq?) is called. Should it succeed that's it, otherwise the entire thing redos the work the old fashioned way. I have not looked to closely yet but I think this is very much doable without much swearing, I am going to look into it after I find some time, maybe this weekend. Regardless of the above I think decoupling actual dentry ref from the d_lock is a valuable step anyway, I am going to take a stab at that too. Most of the work is kind of already done with the 1->0 transition already handled. Just need to replace non-atomic updates with atomics and cmpcxhg with a flag to whack new additions. All that aside, the lockref patch reported here needs to get dropped from the tree and I don't think a lockref-specific replacement is viable. --=20 Mateusz Guzik