Re: [linux-next:master] [lockref] d042dae6ad: unixbench.throughput -33.7% regression

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Mateusz Guzik <mjguzik@gmail.com>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Christian Brauner <brauner@kernel.org>,
	kernel test robot <oliver.sang@intel.com>,
	oe-lkp@lists.linux.dev,  lkp@intel.com,
	Linux Memory Management List <linux-mm@kvack.org>,
	linux-kernel@vger.kernel.org,  ying.huang@intel.com,
	feng.tang@intel.com, fengwei.yin@intel.com
Subject: Re: [linux-next:master] [lockref] d042dae6ad: unixbench.throughput -33.7% regression
Date: Tue, 2 Jul 2024 19:02:56 +0200	[thread overview]
Message-ID: <CAGudoHGuTP-nv=zwXdQs38OEqb=BD=i-vA-9xjZ0UOyvWuXP_w@mail.gmail.com> (raw)
In-Reply-To: <CAHk-=wgnDSS7yqNbQQ9R6Zt7gzg6SKs6myW1AfkvhApXKgUg4A@mail.gmail.com>

On Tue, Jul 2, 2024 at 6:47 PM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> On Tue, 2 Jul 2024 at 05:10, Mateusz Guzik <mjguzik@gmail.com> wrote:
> >
> > Well there is also the option of going full RCU in the fast path, which
> > I mentioned last time around lockref.
> >
> > This would be applicable at least for stat, fstatx, readlink and access
> > syscalls.
>
> Yes. That would be the optimal thing - have some "don't take a lockref
> on the last component at all, because we will finish the use of it
> under RCU".
>
> I looked at that some time ago, and it didn't look _horrendous_ from a
> conceptual standpoint, but the details just got to be nasty.
>
> What I wanted to do was to hook into the "we're still in RCU mode"
> with a callback that stat could set.
>
> And we'd call it at complete_walk() -> try_to_unlazy() ->
> legitimize_path() time just before we do that lockref_get_not_dead()
> thing.
>
> So then the path walkers that are ok with RCU state (ie mostly just
> 'stat()' and friends) could set that callback, and get a callback
> while the path walk is still in RCU mode, and could fill in the stat
> data then and say "I'm done" and we'd never actually finalize the path
> at all, and never do the final lockref_get_not_dead().
>
> Sounds simple in theory. And then when I looked at doing the actual
> code patch, I ended up just running away scared.

I was thinking a different approach.

A lookup variant which resolves everything and returns the dentry + an
information whether this is rcu mode.

if not the regular handling + path_put sort it out.

If yes then the fast path handling gets involved. If a filesystem can
provide a custom callback for the regular usage above, there would be
an optional callback for rcu mode as well (and it would be illegal to
only have one). Should this run into any trouble it can return -AGAIN
at which point try_to_get_actual_full_ref() (but better named) routine
is called and it tries to get the actual ref.

Suppose the callback or in-place handling worked out. Then a routine
to validate nothing changed (at least dentry seq?) is called. Should
it succeed that's it, otherwise the entire thing redos the work the
old fashioned way.

I have not looked to closely yet but I think this is very much doable
without much swearing, I am going to look into it after I find some
time, maybe this weekend.

Regardless of the above I think decoupling actual dentry ref from the
d_lock is a valuable step anyway, I am going to take a stab at that
too. Most of the work is kind of already done with the 1->0 transition
already handled. Just need to replace non-atomic updates with atomics
and cmpcxhg with a flag to whack new additions.

All that aside, the lockref patch reported here needs to get dropped
from the tree and I don't think a lockref-specific replacement is
viable.
-- 
Mateusz Guzik <mjguzik gmail.com>

next prev parent reply	other threads:[~2024-07-02 17:03 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-06-27  2:41 kernel test robot
2024-06-27  6:25 ` Mateusz Guzik
2024-06-27  7:00   ` Mateusz Guzik
2024-06-27 16:32     ` Linus Torvalds
2024-06-27 16:55       ` Mateusz Guzik
2024-06-27 16:57       ` Linus Torvalds
2024-06-27 17:20         ` Mateusz Guzik
2024-06-27 17:23         ` Linus Torvalds
2024-07-02  7:19 ` Mateusz Guzik
2024-07-02 12:10   ` Mateusz Guzik
2024-07-02 16:47     ` Linus Torvalds
2024-07-02 17:02       ` Mateusz Guzik [this message]
2024-07-02 17:28         ` Linus Torvalds
2024-07-02 17:46           ` Mateusz Guzik
2024-07-02 17:58             ` Mateusz Guzik
2024-07-02 18:41               ` Linus Torvalds
2024-07-02 20:33                 ` Mateusz Guzik
2024-07-02 20:42                   ` Linus Torvalds
2024-07-02 21:15                     ` Mateusz Guzik
2024-07-02 22:14                       ` Linus Torvalds
2024-07-03 13:53                         ` Mateusz Guzik
2024-07-03 14:08                           ` Christian Brauner
2024-07-03 14:11                             ` Mateusz Guzik
2024-07-03 16:47                           ` Linus Torvalds
2024-07-03  8:34   ` Christian Brauner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAGudoHGuTP-nv=zwXdQs38OEqb=BD=i-vA-9xjZ0UOyvWuXP_w@mail.gmail.com' \
    --to=mjguzik@gmail.com \
    --cc=brauner@kernel.org \
    --cc=feng.tang@intel.com \
    --cc=fengwei.yin@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lkp@intel.com \
    --cc=oe-lkp@lists.linux.dev \
    --cc=oliver.sang@intel.com \
    --cc=torvalds@linux-foundation.org \
    --cc=ying.huang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox