From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E7097C2BD09 for ; Thu, 27 Jun 2024 16:32:26 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 6B76D6B0095; Thu, 27 Jun 2024 12:32:26 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 668116B0096; Thu, 27 Jun 2024 12:32:26 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 507F36B0098; Thu, 27 Jun 2024 12:32:26 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 3029A6B0095 for ; Thu, 27 Jun 2024 12:32:26 -0400 (EDT) Received: from smtpin27.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id B61341C1DAB for ; Thu, 27 Jun 2024 16:32:24 +0000 (UTC) X-FDA: 82277211408.27.A88E07A Received: from mail-lj1-f175.google.com (mail-lj1-f175.google.com [209.85.208.175]) by imf02.hostedemail.com (Postfix) with ESMTP id 405EB80030 for ; Thu, 27 Jun 2024 16:32:21 +0000 (UTC) Authentication-Results: imf02.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=google header.b="Bn3Ei9Y/"; spf=pass (imf02.hostedemail.com: domain of torvalds@linuxfoundation.org designates 209.85.208.175 as permitted sender) smtp.mailfrom=torvalds@linuxfoundation.org; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1719505919; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=7ycSebCrQAPdM5kjmLyRXlbDvO9GJ9QZljEbMZ/M6I8=; b=yvqqITTEFwgoMRCQ2xjSBmYjgWmCb4VlNBDEBqJXHJIEwy7LaOx3yvTH5V+XFv9FwYMNKJ 5mIDmDS9kasPB6sySgiOsB9OKDt9kV6tYOLwaeG8q7vAi9HRx06u6HB6qYL6NNq/NmMm6+ je8T00IpMnFw0yjmuTog9EQnY/PJoR8= ARC-Authentication-Results: i=1; imf02.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=google header.b="Bn3Ei9Y/"; spf=pass (imf02.hostedemail.com: domain of torvalds@linuxfoundation.org designates 209.85.208.175 as permitted sender) smtp.mailfrom=torvalds@linuxfoundation.org; dmarc=none ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1719505919; a=rsa-sha256; cv=none; b=HpL4WCJhfBEORBPGGLdrI0fHwk1j7khDgAlU8pcivbYfN0ovCyVRrY9bhLfQ9aj6pyCBQ0 7Zf37FGLU6Qu4PLhzsLJlycLeOZUoqua6uSDxJI178gXsyR6hVcM1TNxLP06IhAXTIrksS SDpSGlOx02kTv87J9iGmGZO+iSlB5bs= Received: by mail-lj1-f175.google.com with SMTP id 38308e7fff4ca-2ebe40673d8so95800361fa.3 for ; Thu, 27 Jun 2024 09:32:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux-foundation.org; s=google; t=1719505940; x=1720110740; darn=kvack.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=7ycSebCrQAPdM5kjmLyRXlbDvO9GJ9QZljEbMZ/M6I8=; b=Bn3Ei9Y/OgYqt805sSqcMRF2vZ/K1zt9/+JbkS7Tx7BBFYqtp7+IiUP90jEOxQAH6U 6Rx1vO7lxTzlse5tf5cIga6b16UlinngyXKf2KS6nW3R6Ngdizo3ZL3hpJxBX4Pehj+6 Ai4Ts7c21pABvAnW5AkTqgsNMajYCA2U+chvo= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1719505940; x=1720110740; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=7ycSebCrQAPdM5kjmLyRXlbDvO9GJ9QZljEbMZ/M6I8=; b=KSN0N9X9A3X4YGBlFmYedrdGITGERMOtjTPVSj8CIaFhoW9YktU1WWFanki62vb4hj vB3h2AbYk6Hkxxdueoce8JywcbpaJlDPjGUp77IUMj4BbVw0G9Zw0Yq964TJmiz4RkNm Y8/rw7ncwtRiFeoSlZU1SQW/pV7snMFNqGoCXtGatd+QR9SaG2tsQsYr70Ws++eUS4RD 8BHyfoUIrfZnI6xmKWhcxpV9xrvrZ5L+nYBgvYNfMqgvKYpWUfb8dDrQkKBYJhpffJbb yfaZ2rO8RmhcC7X/mw7kiOhMGO2RLq5axAdU7AxfqudBw3zzEsSDNKP8h0RjhewY10DR 11kA== X-Forwarded-Encrypted: i=1; AJvYcCUZAc8cVDiFfv9wKr2CJI3KFR47ZXyA2sMPv9beyDQSu2i8Ju5xeOm3qV5OM6Bx2TodYhPM1BwahyercvtXAONZRL8= X-Gm-Message-State: AOJu0YwnsbHvXpWmYr4WZq7vTGfkI7411hBiZp2iklWx3ocHLcIbY6Sn NRxt/F/OrOzQ51PevRy6PPdjErs/BuX9c13QT4R3aHfiJfnojEc/2h5ITZx+HSd8tyXTA25rVan zfIF+sA== X-Google-Smtp-Source: AGHT+IE3T/Z+gBYsEk/rfz5ObfDow5uUmuqO4KYIoSiNA+yOPOcJ0srreTXh3pAdV9j22sR+loXpPw== X-Received: by 2002:ac2:5dc9:0:b0:52c:dec1:4578 with SMTP id 2adb3069b0e04-52ce1864495mr8566119e87.60.1719505940272; Thu, 27 Jun 2024 09:32:20 -0700 (PDT) Received: from mail-lf1-f50.google.com (mail-lf1-f50.google.com. [209.85.167.50]) by smtp.gmail.com with ESMTPSA id 2adb3069b0e04-52e712bca4fsm251282e87.117.2024.06.27.09.32.18 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 27 Jun 2024 09:32:18 -0700 (PDT) Received: by mail-lf1-f50.google.com with SMTP id 2adb3069b0e04-52cd8897c73so7292974e87.2 for ; Thu, 27 Jun 2024 09:32:18 -0700 (PDT) X-Forwarded-Encrypted: i=1; AJvYcCVbGt8MBKb4B/Zs6mIgU0inDTAJ+Dff5Q3rWO2V+95wl3l2eWBudIYI/PkQp5ODP2TCzOyyV4KDsrSiMGIFRSfkv7E= X-Received: by 2002:ac2:5639:0:b0:52c:deb3:e3cf with SMTP id 2adb3069b0e04-52ce183504bmr8087314e87.24.1719505938254; Thu, 27 Jun 2024 09:32:18 -0700 (PDT) MIME-Version: 1.0 References: <202406270912.633e6c61-oliver.sang@intel.com> In-Reply-To: From: Linus Torvalds Date: Thu, 27 Jun 2024 09:32:01 -0700 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [linux-next:master] [lockref] d042dae6ad: unixbench.throughput -33.7% regression To: Mateusz Guzik Cc: kernel test robot , oe-lkp@lists.linux.dev, lkp@intel.com, Linux Memory Management List , Christian Brauner , linux-kernel@vger.kernel.org, ying.huang@intel.com, feng.tang@intel.com, fengwei.yin@intel.com Content-Type: text/plain; charset="UTF-8" X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 405EB80030 X-Stat-Signature: mpoxq96ayfws1wontsn19btyn4intpwc X-Rspam-User: X-HE-Tag: 1719505941-353580 X-HE-Meta: U2FsdGVkX19cfMd14JtemuiSmhaocn97sDASfL0h0G7wN3oMmXo2Zah1zcVUThNgKp378m7qUmJu+JWyo9Ecf2a1rwap2FrYa5nRfa/aXb4RokAYXqbQr7MtwzWX2WBwY7AWYvKkRdSfMK7Hw/kOJtfKjubCx2ssgKvYXlLQspVo1e0jomqWCrKs7YqLN9Tt2eAk5ymIJYM5zQKCt3K1dTiXnEC1BYq2iqV8oy4mCJ8AHftq68cC2SaVe06OZprZNnjuSZGp4r0Q0AUxHwmKdPlWCQ6jhN+8vwHFrJMcrIBOp7NfRjd0+xxt0wwhhX9aauPqeKJ3Q43oTlm3v6ePOMbEBEntKbmHWeHN3z/5Yex4yHbiMDst3vj/JwDHceoL+9ro7/JlquxytjYQq/VCXSYhMiBodgvSeFmUu0F7Yx8KEw8Ep7KBBsq74kS5j3GJldu7qcwH5PfrZUaR96wYna/OTHnlAUsKOK2bRittdXNuW//RQoFrpzHrz444S+TYohQGZClgkVI5TC9aj9dddBDLFkayj02S49bJto7I1DFi4mbd3Js57mSdDBM1oGN50LlFBUai5IdneJY/HXgkqtc6P7Gjj+I38xfU5jl8FqkFcnl2f5x0RvOQsBTgBChh2zwN2mH4q4DX+d2ynU9T5A1QBIFpmI1lbLmFjEO//lH4thR567SGz0l8fS31KsPDDLdBvCQFbqKg6AtPCIwp5UPLfsDYEhJlAsWC1aMjMnKkLx3cKzj7C6bWZtBpTOIDM/4BK3wX3mScjzVAztie0nmUpAREdWaV+QGSOFOCfQjOm8gOBC2ZavRdwGYtp9EsB/dwB+kDTgt1LYDWg1a92KQLO8ce3aPNHhcbMnoYz1bQkKIijMOv2L+MnssgWeUK2Huu5FqsxuNhmZ/jBZRfMVYinKvXzyt7B7GRMcAYQCrofgP7NaDNrYRPS0vVvTRNvITosvtmFd2eHIagSvZ ZN+Cs99v TXbQ0vATZ13qaGumJL2R+IFZVj+FZN4yzefsiREJi0GO1QQkduwoh5pP5xdmM2Be1noc5bYMbGxLNbIlO7OmDnRv6FA8AS3UljWwIAse5WbdZ8ChqullZTTiEE3lHnu51wXl5dtP/+oRUIsAzGTfWTp9yNqWpEJ4VyH37EHQyGg2X0D3JvgH8HbhQntrzSGAOs2A8VFf5lrPjl4H0ERVUEYu2Hjr8RsQjJeIWXd4Iaz87JNY72AIf1Jy9K+gMLX74qVmdjTwdCmp+JTbX9q0Vxn7lLS8WjlSy7cVhDYRSejNF7/d5tt3sCEJKzNtVpS4UaVGkiJOYE8ttUbpldABpEWyBh9GAwTuJLBlESgfDgWdKNry6jlgc6GEB2UTf+cDR2hiGnjuQQPs8YrwmgqhaAMGx6xchMd+E6MQm9RbiqSaJQwrPtQqZu/RmLNkx/9YKYdOV/y3umdN1G+vu1xpq8SHzUmkzWZw4Tc+j X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, 27 Jun 2024 at 00:00, Mateusz Guzik wrote: > > I'm arranging access to a 128-way machine to prod, will report back > after I can profile on it. Note that the bigger problem is probably not the 128-way part, but the "two socket" part. >From a quick look at the profile data, we have for self-cycles: shell1 subtest: +4.2 lockref_get_not_dead +4.5 lockref_put_return +16.8 native_queued_spin_lock_slowpath getdent subtest: +4.1 lockref_put_return +5.7 lockref_get_not_dead +68.0 native_queued_spin_lock_slowpath which means that the spin lock got much more expensive, _despite_ the "fast path" in lockref being more aggressive. Which in turn implies that the problem may be at least partly simply due to much more cacheline ping-pong. In particular, the lockref routines may be "fast", but they hit that one cacheline over and over again and have a thundering herd issue, while the queued spinlocks on their own actually try to avoid that for multiple CPU's. IOW, the queue in the queued spinlocks isn't primarily about fairness (although that is a thing), it's about not having all CPU cores accessing the same spinlock cacheline. Note that a lot of the other numbers seem "incidental". For example, for the getdents subtest we have a lot of the numbers going down by ~55%, but while that looks like a big change, it's actually just a direct result of this: -56.5% stress-ng.getdent.ops iow, the benchmark fundamentally did 56% less work. IOW, I think this may all be fundamental, and we just can't do the "wait for spinlock" thing, because that whole loop with a cpu_relax() is just deadly. And we've seen those kinds of busy-loops be huge problems before. When you have this kind of busy-loop: old.lock_count = READ_ONCE(lockref->lock_count); do { if (lockref_locked(old)) { cpu_relax(); old.lock_count = READ_ONCE(lockref->lock_count); continue; } the "cpu_relax()" is horrendously expensive, but not having it is not really an option either, since it will just cause a tight core-only loop. I suspect removing the cpu_relax() would help performance, but I suspect the main help would come from it effectively cutting down the wait cycles to practically nothing. Linus