From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <owner-linux-mm@kvack.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 70619C2BD09
	for <linux-mm@archiver.kernel.org>; Thu, 27 Jun 2024 16:55:20 +0000 (UTC)
Received: by kanga.kvack.org (Postfix)
	id D10186B008A; Thu, 27 Jun 2024 12:55:19 -0400 (EDT)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id CBFD86B0092; Thu, 27 Jun 2024 12:55:19 -0400 (EDT)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id B87196B0096; Thu, 27 Jun 2024 12:55:19 -0400 (EDT)
X-Delivered-To: linux-mm@kvack.org
Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13])
	by kanga.kvack.org (Postfix) with ESMTP id 963B36B008A
	for <linux-mm@kvack.org>; Thu, 27 Jun 2024 12:55:19 -0400 (EDT)
Received: from smtpin03.hostedemail.com (a10.router.float.18 [10.200.18.1])
	by unirelay03.hostedemail.com (Postfix) with ESMTP id BAA77A05C7
	for <linux-mm@kvack.org>; Thu, 27 Jun 2024 16:55:18 +0000 (UTC)
X-FDA: 82277269116.03.06833AE
Received: from mail-ej1-f45.google.com (mail-ej1-f45.google.com [209.85.218.45])
	by imf21.hostedemail.com (Postfix) with ESMTP id D2F931C0008
	for <linux-mm@kvack.org>; Thu, 27 Jun 2024 16:55:16 +0000 (UTC)
Authentication-Results: imf21.hostedemail.com;
	dkim=pass header.d=gmail.com header.s=20230601 header.b=aHDLCk8P;
	spf=pass (imf21.hostedemail.com: domain of mjguzik@gmail.com designates 209.85.218.45 as permitted sender) smtp.mailfrom=mjguzik@gmail.com;
	dmarc=pass (policy=none) header.from=gmail.com
ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1719507307; a=rsa-sha256;
	cv=none;
	b=a6T6HIATEoxKr6G0WPOcJiQRVo+xBvyP7stw970TAELP94895TEO/+pmfwIE2vLIQulQUj
	PuUCpFaNE45t+SbDZ0kTR6fc/CAfjJuHE+C1Eiym4aEQSu84zmkoudBXOPfdv9uX5KAZQs
	gppPUjTYsqKYwV2enzPMygKPTkyvwZE=
ARC-Authentication-Results: i=1;
	imf21.hostedemail.com;
	dkim=pass header.d=gmail.com header.s=20230601 header.b=aHDLCk8P;
	spf=pass (imf21.hostedemail.com: domain of mjguzik@gmail.com designates 209.85.218.45 as permitted sender) smtp.mailfrom=mjguzik@gmail.com;
	dmarc=pass (policy=none) header.from=gmail.com
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com;
	s=arc-20220608; t=1719507307;
	h=from:from:sender:reply-to:subject:subject:date:date:
	 message-id:message-id:to:to:cc:cc:mime-version:mime-version:
	 content-type:content-type:
	 content-transfer-encoding:content-transfer-encoding:
	 in-reply-to:in-reply-to:references:references:dkim-signature;
	bh=niteNh3TbVLu7eb0D9sBMdcI9GkHgkcpq+/w6gXTGSU=;
	b=xLNPce8tPyn2MJ+8C+/N/LAJGnT9BKM8s5QPDJGLJ037rUeJgXqgigf8qUdhJEEYFoyxwY
	wvuiXN57iYRwjGT8yEhOJjnDxvGSE5BM3ODhhGE3Vpczfc8Dz4rdUr/Nyf54jqnkorlPB6
	8LUjc3SE39DhaPjnH0kW9f/r48JdzLM=
Received: by mail-ej1-f45.google.com with SMTP id a640c23a62f3a-a6fdd947967so646243466b.2
        for <linux-mm@kvack.org>; Thu, 27 Jun 2024 09:55:16 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20230601; t=1719507315; x=1720112115; darn=kvack.org;
        h=content-transfer-encoding:cc:to:subject:message-id:date:from
         :in-reply-to:references:mime-version:from:to:cc:subject:date
         :message-id:reply-to;
        bh=niteNh3TbVLu7eb0D9sBMdcI9GkHgkcpq+/w6gXTGSU=;
        b=aHDLCk8PRMDvLcYqEC3McQV2X4Cu7nlHRDbmvTg9vCcl6Vxypxl4G2zpmaofzSVf5d
         8cXsakKHaAfPkwcNVfc5gtK3ZLO8Ke4SHmQ+6wU0qYquO0okGe2L4JOuGya4TrZSVOpB
         pW0VDIYcB8c6w6yNpLUv74ST78XRb0vGDGxWuSqrc3eXrkvFsO7RvgJLrLr+5qRuPn0x
         /TvJlb8V4PD5azwpR3fMGfom+nEErcqgziQiQUSMP2Trpp4PAUwEpBT3Ntgh29jIF7xA
         +8qjH/L2PR2otytuA26CLC1qSQTf9dBafAZFwquApbdjPGfa9+eh18wyTK1iMIrfgeEa
         ny0g==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1719507315; x=1720112115;
        h=content-transfer-encoding:cc:to:subject:message-id:date:from
         :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=niteNh3TbVLu7eb0D9sBMdcI9GkHgkcpq+/w6gXTGSU=;
        b=umIkm0fiFYvaYDVhbcCD6aDtdOSMlfPKsAg/4UwfsycK+/3PjYk6Vs8KtsT/l5gI4o
         sF90uZmqdzAVbZT8muXF9UHA7Gd1CL8NAKUxWrsxmi2CMrfsdqUWCGXgwZKzoNpv3NqW
         MSpKBH1V5lvymdWuokVvTzTV1AzcvHHam8KcvdHYqBOsRwiPthskXjeqNQRvmkTPePS3
         B0O3Z9VbFhzYBwnimpIZkeIiG+v+b7ZJeytut7IS7MPhDEFP41OwzlJGbGlzpr2Cpu5i
         k9249X9UmCAvkcifD/LyT2rKsbCXZnVuy0O/Y7E+PMsQ5rMduYTLlwAHd3gu57NVnOTx
         FKXA==
X-Forwarded-Encrypted: i=1; AJvYcCVa1xhXWItOhuvnGQyNpeBo3g8PyKymQmA2TvzsXytkRr5nuyL7ny42VM4zyhXvaS1Qf/v3c9aa0ve6gCnycF7pXwk=
X-Gm-Message-State: AOJu0Yw5KMMSm5OYJBWeRb9RhRJrrp+wHh/aD95keQ7w7Wsve+txiKIr
	oiQeKw12zrKg7NGThI7eDOJRQKSx7MgQfpSeNrPd5NmYegAVNSX5mxVfICVKV6fJ0Eze7vV1Zns
	MZkZf09l50U8nlmH2ChC+/w4e/fg=
X-Google-Smtp-Source: AGHT+IH0dIKUR/haFbE60cInxlDg0jpKpfwKboTD7Ew46GpSk6YvqS7ECQ1CFI021YcGgAHu64qodhxqduXhre2e1pk=
X-Received: by 2002:a17:906:6a22:b0:a72:a206:ddc2 with SMTP id
 a640c23a62f3a-a72a206df24mr160444366b.36.1719507314776; Thu, 27 Jun 2024
 09:55:14 -0700 (PDT)
MIME-Version: 1.0
References: <202406270912.633e6c61-oliver.sang@intel.com> <CAGudoHF7ys8bTCE0G6vLzEbo+_NyQXmAEEGPJ4hx1yoYmBsoUA@mail.gmail.com>
 <CAGudoHFrMkdo1CoVxJUiEvQ_DyW3hzaCz18GjvLi4ny=o-q9ZQ@mail.gmail.com> <CAHk-=wg7PXo_QbBo8gv27OpbMgAwLh9H46kJRxAmp0FL0QD7HA@mail.gmail.com>
In-Reply-To: <CAHk-=wg7PXo_QbBo8gv27OpbMgAwLh9H46kJRxAmp0FL0QD7HA@mail.gmail.com>
From: Mateusz Guzik <mjguzik@gmail.com>
Date: Thu, 27 Jun 2024 18:55:02 +0200
Message-ID: <CAGudoHFfH7d2AnWOeTNWF_BvF8qizxp+FMK6e1CyeOfTk0fLGQ@mail.gmail.com>
Subject: Re: [linux-next:master] [lockref] d042dae6ad: unixbench.throughput
 -33.7% regression
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: kernel test robot <oliver.sang@intel.com>, oe-lkp@lists.linux.dev, lkp@intel.com, 
	Linux Memory Management List <linux-mm@kvack.org>, Christian Brauner <brauner@kernel.org>, linux-kernel@vger.kernel.org, 
	ying.huang@intel.com, feng.tang@intel.com, fengwei.yin@intel.com
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Stat-Signature: bwhfzodmfbqm5kd3gbx988bwp68bczxs
X-Rspamd-Queue-Id: D2F931C0008
X-Rspam-User: 
X-Rspamd-Server: rspam10
X-HE-Tag: 1719507316-977210
X-HE-Meta: U2FsdGVkX1/6G6qDdgiMHs2jS0X74dac+fjudGMgQlpFLWYn/oxWWc1ZDi7eI2HKmxMY5Z/76De5BmoQzdC4WqCJ/qKYmALrjWBS7JoVeiNLEXOyDdb2ltuyC485Drzhk9Iv2f5cf/ZBGyGK3f8u2X8YzKjq7bvVRd5CeFgjG9avDzEc3w/WULfYim9HOn22e5oh9MP6PHLmQ/t+gnAWxPgEQ2jOYntjQvlcxgf40xGQmBV3Wh4REMN+Gq6W4bMnLcodxuhMTOhct4zHh6u81p5AX5h3dWWnm3PRrbuw2FCkaQCsYLAi3gFT5ldHqnQ8l54y/QHUNi5GdOOUhFz0T9V/lyNA8kUCpxXk4cNAuhjB4m5j6BJVqFYO2+1lzziZBGvHdECnX7n6SCFNIbMQ78Why3svas9bm48ML7e+QlqWHcy/VLauDcBspURD5A8xSoWXkk7q2jF0Q4shdEGCy44zd45saeDtfroSmsoUFgrqU4500qjOM08qEXxUFuG+QlL6uRhlv6zFGXcG4+knyJJxZstPoUCRLyGVTe75UXaC0WYJwbRagwuu/v9u6RS4aVBKjCWDA6kc9f/NaZxIe43nAQvY5HWkMWMA776tZ0QHxr4+51WIDT/wLzTcAL5cEqfxchTwhXIsh4vkhlU0ihPQZDjAcsjhPwudf5p/mbN4DTH8AN91dcq+wfGGU5YbvEYz6AwyXAiB2BwGxD6DrEB2EnDr91acH/lDdZ/OQAaq/TwIQhreiiB7MDasaZSrHOOI8YA4eWGyAzGGxSK46M99a4NLyTN9LtymDHlxUc3SLd1YE0hi0EFq6OM6sC+dhIhiiQdAa4J/htLURtY+IjxXb8lTroqdUcix3DS3Y09V4cRskJjkO31huC0Y3gVIlAxYupPoKFeRS1243SMyXfgLvaf8VOMQS35rBvR538SOCx2Pb43iVl+88I8rEfNCJ8XY9iEx3uAVUTz5GGa
 vb3NKdLY
 ljnFYJK8r4uBvYmNvBPsUJgg7SjgGMdRQwEe5SQgtrMZu53Tne4ZPsa6XXkHBDvZinxSk+jOUm8sSU008lVNsG1D84uF7WmlIJvsPxrOzyA9D+YuJEj6KWCa4KYizJm9NX8F50DrEaokJeMsknnQX8BiXEJ/R8iBnTIfW3jMVwRgDSvm54FJTxvsrAKnP6BXPnnDuxLf9neeKz5TaPMD7WKEcZ55MxEDm8hyfr7pr/avx/w/SjKaJ7gZoM1llGHR0IzIJvBYIA6rvsw3c2xXKbxlnsG8bg1CjN/c5pOGaoy1PuoGUHWwUgIpGmRfU5fJij2eYTzl503Acjaoow85RBToDz/D1PCB8Rn7849UoqLNVhR+8sbQgIZzEFiAqYVvwIpMl
X-Bogosity: Ham, tests=bogofilter, spamicity=0.008358, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>
List-Subscribe: <mailto:majordomo@kvack.org>
List-Unsubscribe: <mailto:majordomo@kvack.org>

On Thu, Jun 27, 2024 at 6:32=E2=80=AFPM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> On Thu, 27 Jun 2024 at 00:00, Mateusz Guzik <mjguzik@gmail.com> wrote:
> >
> > I'm arranging access to a 128-way machine to prod, will report back
> > after I can profile on it.
>
> Note that the bigger problem is probably not the 128-way part, but the
> "two socket" part.
>

I know mate, I'm painfully aware of NUMA realities.

The bigger stuff I have intermittent access to is all multi socket,
the above is basically "need to use something bigger than my usual
trusty small core setup".

> From a quick look at the profile data, we have for self-cycles:
>
>   shell1 subtest:
>       +4.2   lockref_get_not_dead
>       +4.5   lockref_put_return
>      +16.8   native_queued_spin_lock_slowpath
>
>   getdent subtest:
>       +4.1   lockref_put_return
>       +5.7   lockref_get_not_dead
>      +68.0   native_queued_spin_lock_slowpath
>
> which means that the spin lock got much more expensive, _despite_ the
> "fast path" in lockref being more aggressive.
>
> Which in turn implies that the problem may be at least partly simply
> due to much more cacheline ping-pong. In particular, the lockref
> routines may be "fast", but they hit that one cacheline over and over
> again and have a thundering herd issue, while the queued spinlocks on
> their own actually try to avoid that for multiple CPU's.
>
> IOW, the queue in the queued spinlocks isn't primarily about fairness
> (although that is a thing), it's about not having all CPU cores
> accessing the same spinlock cacheline.
>
> Note that a lot of the other numbers seem "incidental". For example,
> for the getdents subtest we have a lot of the numbers going down by
> ~55%, but while that looks like a big change, it's actually just a
> direct result of this:
>
>      -56.5%  stress-ng.getdent.ops
>
> iow, the benchmark fundamentally did 56% less work.
>
> IOW, I think this may all be fundamental, and we just can't do the
> "wait for spinlock" thing, because that whole loop with a cpu_relax()
> is just deadly.
>
> And we've seen those kinds of busy-loops be huge problems before. When
> you have this kind of busy-loop:
>
>     old.lock_count =3D READ_ONCE(lockref->lock_count);
>     do {
>         if (lockref_locked(old)) {
>             cpu_relax();
>             old.lock_count =3D READ_ONCE(lockref->lock_count);
>             continue;
>         }
>
> the "cpu_relax()" is horrendously expensive, but not having it is not
> really an option either, since it will just cause a tight core-only
> loop.
>
> I suspect removing the cpu_relax() would help performance, but I
> suspect the main help would come from it effectively cutting down the
> wait cycles to practically nothing.
>

As far as lockref goes I had 2 ideas to test:
1. cpu_relax more than once, backoff style. Note the the total spin
count would still be bound before the routine gives up and locks. this
should reduce pulling
2. check how many spins would be needed to wait for before lockref
decides to fall back to locking

When messing around not automatically taking the lock I measured spin
counts with an artificially high limit. It was something like < 300 to
cover the cases I ran into (not this benchmark). For all I know the
limit can be bumped to -- say 256 -- and numerous acquires will
disappear at that scale, which should be good enough for
everybody(tm). All while forward progress is guaranteed.

That aside even that getdent thing uses mutexes and there is something
odd going on there. I verified workers go to sleep when it is
avoidable, which disfigures the result. I don't know the scale of it,
it may be it is a tiny fraction of consumers, but I do see them on
offcpu tracing.

That's that for handwaving. I'm going to get the hw and get hard
profiling data + results.

--=20
Mateusz Guzik <mjguzik gmail.com>