Re: [linux-next:master] [lockref] d042dae6ad: unixbench.throughput -33.7% regression

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Linus Torvalds <torvalds@linux-foundation.org>
To: Mateusz Guzik <mjguzik@gmail.com>
Cc: kernel test robot <oliver.sang@intel.com>,
	oe-lkp@lists.linux.dev, lkp@intel.com,
	 Linux Memory Management List <linux-mm@kvack.org>,
	Christian Brauner <brauner@kernel.org>,
	linux-kernel@vger.kernel.org,  ying.huang@intel.com,
	feng.tang@intel.com, fengwei.yin@intel.com
Subject: Re: [linux-next:master] [lockref] d042dae6ad: unixbench.throughput -33.7% regression
Date: Thu, 27 Jun 2024 09:32:01 -0700	[thread overview]
Message-ID: <CAHk-=wg7PXo_QbBo8gv27OpbMgAwLh9H46kJRxAmp0FL0QD7HA@mail.gmail.com> (raw)
In-Reply-To: <CAGudoHFrMkdo1CoVxJUiEvQ_DyW3hzaCz18GjvLi4ny=o-q9ZQ@mail.gmail.com>

On Thu, 27 Jun 2024 at 00:00, Mateusz Guzik <mjguzik@gmail.com> wrote:
>
> I'm arranging access to a 128-way machine to prod, will report back
> after I can profile on it.

Note that the bigger problem is probably not the 128-way part, but the
"two socket" part.

From a quick look at the profile data, we have for self-cycles:

  shell1 subtest:
      +4.2   lockref_get_not_dead
      +4.5   lockref_put_return
     +16.8   native_queued_spin_lock_slowpath

  getdent subtest:
      +4.1   lockref_put_return
      +5.7   lockref_get_not_dead
     +68.0   native_queued_spin_lock_slowpath

which means that the spin lock got much more expensive, _despite_ the
"fast path" in lockref being more aggressive.

Which in turn implies that the problem may be at least partly simply
due to much more cacheline ping-pong. In particular, the lockref
routines may be "fast", but they hit that one cacheline over and over
again and have a thundering herd issue, while the queued spinlocks on
their own actually try to avoid that for multiple CPU's.

IOW, the queue in the queued spinlocks isn't primarily about fairness
(although that is a thing), it's about not having all CPU cores
accessing the same spinlock cacheline.

Note that a lot of the other numbers seem "incidental". For example,
for the getdents subtest we have a lot of the numbers going down by
~55%, but while that looks like a big change, it's actually just a
direct result of this:

     -56.5%  stress-ng.getdent.ops

iow, the benchmark fundamentally did 56% less work.

IOW, I think this may all be fundamental, and we just can't do the
"wait for spinlock" thing, because that whole loop with a cpu_relax()
is just deadly.

And we've seen those kinds of busy-loops be huge problems before. When
you have this kind of busy-loop:

    old.lock_count = READ_ONCE(lockref->lock_count);
    do {
        if (lockref_locked(old)) {
            cpu_relax();
            old.lock_count = READ_ONCE(lockref->lock_count);
            continue;
        }

the "cpu_relax()" is horrendously expensive, but not having it is not
really an option either, since it will just cause a tight core-only
loop.

I suspect removing the cpu_relax() would help performance, but I
suspect the main help would come from it effectively cutting down the
wait cycles to practically nothing.

               Linus

next prev parent reply	other threads:[~2024-06-27 16:32 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-06-27  2:41 kernel test robot
2024-06-27  6:25 ` Mateusz Guzik
2024-06-27  7:00   ` Mateusz Guzik
2024-06-27 16:32     ` Linus Torvalds [this message]
2024-06-27 16:55       ` Mateusz Guzik
2024-06-27 16:57       ` Linus Torvalds
2024-06-27 17:20         ` Mateusz Guzik
2024-06-27 17:23         ` Linus Torvalds
2024-07-02  7:19 ` Mateusz Guzik
2024-07-02 12:10   ` Mateusz Guzik
2024-07-02 16:47     ` Linus Torvalds
2024-07-02 17:02       ` Mateusz Guzik
2024-07-02 17:28         ` Linus Torvalds
2024-07-02 17:46           ` Mateusz Guzik
2024-07-02 17:58             ` Mateusz Guzik
2024-07-02 18:41               ` Linus Torvalds
2024-07-02 20:33                 ` Mateusz Guzik
2024-07-02 20:42                   ` Linus Torvalds
2024-07-02 21:15                     ` Mateusz Guzik
2024-07-02 22:14                       ` Linus Torvalds
2024-07-03 13:53                         ` Mateusz Guzik
2024-07-03 14:08                           ` Christian Brauner
2024-07-03 14:11                             ` Mateusz Guzik
2024-07-03 16:47                           ` Linus Torvalds
2024-07-03  8:34   ` Christian Brauner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAHk-=wg7PXo_QbBo8gv27OpbMgAwLh9H46kJRxAmp0FL0QD7HA@mail.gmail.com' \
    --to=torvalds@linux-foundation.org \
    --cc=brauner@kernel.org \
    --cc=feng.tang@intel.com \
    --cc=fengwei.yin@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lkp@intel.com \
    --cc=mjguzik@gmail.com \
    --cc=oe-lkp@lists.linux.dev \
    --cc=oliver.sang@intel.com \
    --cc=ying.huang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox