From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 594B9C3064D for ; Tue, 2 Jul 2024 12:10:49 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id DF0FF6B00A0; Tue, 2 Jul 2024 08:10:48 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id D9DC36B00A1; Tue, 2 Jul 2024 08:10:48 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C3E6E6B00A2; Tue, 2 Jul 2024 08:10:48 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id A6B286B00A0 for ; Tue, 2 Jul 2024 08:10:48 -0400 (EDT) Received: from smtpin23.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 28DD1141C40 for ; Tue, 2 Jul 2024 12:10:48 +0000 (UTC) X-FDA: 82294696176.23.572C165 Received: from mail-lf1-f42.google.com (mail-lf1-f42.google.com [209.85.167.42]) by imf17.hostedemail.com (Postfix) with ESMTP id 33EF340013 for ; Tue, 2 Jul 2024 12:10:45 +0000 (UTC) Authentication-Results: imf17.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=RE9ZbKqO; spf=pass (imf17.hostedemail.com: domain of mjguzik@gmail.com designates 209.85.167.42 as permitted sender) smtp.mailfrom=mjguzik@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1719922234; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=sk6SMKIZBTYojww5UQdhWUDCeIGJGhWzuYoweujQiDs=; b=amup5QSa6e23dRhqgbmjghcLrVjA34SLpYFVXHW2HuEo3MMvj6VJl0NyqXUY7aFewzootU luIoNKVVcRc+8nqagFFNKbDBP1XvbLY9j8m+4mdqLEOlrfNcctNd8YmXoOSpq8YVMPzaU5 svwDg+/PcpIh/+rkofttfRgfAU70yns= ARC-Authentication-Results: i=1; imf17.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=RE9ZbKqO; spf=pass (imf17.hostedemail.com: domain of mjguzik@gmail.com designates 209.85.167.42 as permitted sender) smtp.mailfrom=mjguzik@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1719922234; a=rsa-sha256; cv=none; b=JogLSH7R4ooqZ8b9QYI87pWVpc1XvI9PfqPcotZqOYhSWAhknQuMib6jjAKRnvw1mRQaSF 2evQO9U7MCp9h1+j8dmtMutVMBjQGlzuk5roUrJwdN4dYdOiVxlUGQ37++iOHkTQOe3OyE ubzBHMV5ypFxbpEMq2Cu86Ih6uwykg8= Received: by mail-lf1-f42.google.com with SMTP id 2adb3069b0e04-52e8b27f493so2386097e87.2 for ; Tue, 02 Jul 2024 05:10:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1719922244; x=1720527044; darn=kvack.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=sk6SMKIZBTYojww5UQdhWUDCeIGJGhWzuYoweujQiDs=; b=RE9ZbKqO0aYffvmfJXsdepwdKVP9gknD9UoEy7RcCWf1bCKaV8HPP1CPaxJv0Bl3fW wn9SUn5Zdql1tja/PToueQkQ+04+8MzwUlzBQjHCvn7Wx6HCcYLNEWhFQJYo6U8Qzy1q j1GLrs/P2hOVRwuvnMc5pvFMAVxz4JqVGIqCWBjf6Rr5TYdSOEPJHIcuOGObELLTLAdw 4N3uf60KzByTU76fmdlx4QriUYwJEYykT5nWKpRVp4BPQYsybbH8mXsFtXcyFKlGMi9m mUivNVP/hJ2eKZbpRqDECZjjXWoHnJ1BwDIyFZrcnTe8NBYvtu89dz831elBYkL/QNho TjJw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1719922244; x=1720527044; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=sk6SMKIZBTYojww5UQdhWUDCeIGJGhWzuYoweujQiDs=; b=O9aHOXB7h58qkcGey+hErqgTlrNd/M4d5Tmrjei14BLT2764UttVoKVajxrpzv4Trf JbleBbct9TmQWycLGc/WAOn8RlygB30zimsJ3EkKoUduu9HWhgokaPJIZQoACWYt/KIY rekrHH96SGiVCgeEG/fjvkdA9lt9cTDhElr8qvAw345r5pZaHO9sS3NyOU4qjI/DC39A yK7Ql47Bopj8mdKLKqMUMPlimFr4kxmwwIu4yhc8ry9akJ3kAv5iwJ7HKwN5uRCREQJh H3tDSXucLMavOnE+qehQKAyGFTfQAIaWVVvYPO2SQ39A0NhuKgngrPKys1FS3eoHY8DM rlvA== X-Forwarded-Encrypted: i=1; AJvYcCUxVhnJQS+LJ6QJ0mQY55ocOvouM2tb7R2SUQJGw0FGKpCWny6DlgKUOndHSYezjHDFMPCcPIbXj7l9Ub6QEw6zbW0= X-Gm-Message-State: AOJu0Ywk/ai3faI/ajHqNdSDqW03aWh75imE/6MRH/4rlf2eCPSxJS4q xKgfxauD78ijqgcuZLPCSLeFeO+ISGNsssBVeuuF/4fxcGquoBUm X-Google-Smtp-Source: AGHT+IG3CUeIVlUj6G6XOYwWOB9zYD1ECertR6Cc6YlVYqDuUpOjWexXWXWOsQdSdWR3c2Z8V2Varg== X-Received: by 2002:a2e:a889:0:b0:2eb:dd0b:b9ec with SMTP id 38308e7fff4ca-2ee5e3a30e0mr66331721fa.20.1719922244116; Tue, 02 Jul 2024 05:10:44 -0700 (PDT) Received: from f (cst-prg-31-46.cust.vodafone.cz. [46.135.31.46]) by smtp.gmail.com with ESMTPSA id 4fb4d7f45d1cf-5861324feb7sm5628623a12.37.2024.07.02.05.10.39 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 02 Jul 2024 05:10:42 -0700 (PDT) Date: Tue, 2 Jul 2024 14:10:32 +0200 From: Mateusz Guzik To: Linus Torvalds , Christian Brauner Cc: kernel test robot , oe-lkp@lists.linux.dev, lkp@intel.com, Linux Memory Management List , linux-kernel@vger.kernel.org, ying.huang@intel.com, feng.tang@intel.com, fengwei.yin@intel.com Subject: Re: [linux-next:master] [lockref] d042dae6ad: unixbench.throughput -33.7% regression Message-ID: References: <202406270912.633e6c61-oliver.sang@intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: X-Rspamd-Server: rspam11 X-Rspamd-Queue-Id: 33EF340013 X-Stat-Signature: tk8d5bsr71pstueyhs936mrt441qi8bd X-Rspam-User: X-HE-Tag: 1719922245-611036 X-HE-Meta: U2FsdGVkX1+qHBPBf/iFfHIViAX3uqtuJu7WUWQ0H44Yi9AATBVLBBnBNPLfuXl8qMktOShOLKFNyduX+ASY6+AFWcnStUxQtdtezciu44pMHoIQaaA/2EWKgnURDUKbPWnOF4NOs9nFh5yXzsUMdST8mbO3Iyn8CQ92lQTwjar3Fu6d+VCrTEWYlkV8NcTP9IXiqoMWQZQQDclHVjM7LTc0dHXBFsGNO81+hBRdQB2wZDgUvFz3hRtdG/a9KXVjA/aumEfJO9JdfQoYK5zFRAyi81jazd4K69nXEZLn31H02/PENj0izeL7r3QdmP6ThwUjjr5tcHQt/tRf5Izdmc5huogz4iAoMTR+I7RDddUPS7AEodLECShbgYwluBAC/krIHfCWeh30i5npomnmZR+MZAk/yaZcRR7iIo5F1WiXsl8US9WdoZl1CT4KVjCPgzgvCsLV2xM248RCLTmBn0OLSirX1adeKfu7UmDIEgCy40VPB1Q+n8SzD9tRULbyBeyCeeTRXaK7LYhtYvWY5dlyeidgc87ZcELJpo3oaMgz6jxYdSQU9NRnEajkij50SA2dcD2vJuUc4ZWgUDbaKqgn62FkFyO2s5sA9X4vS6/4sMQvvhZeCVpcWkSNcDXyXQGST7AMl/Ney6g6xhqX+SFJEqDR4vrEJye9vAueAkjiNwAYefGJT6Lwga853p73s88Nw6gXxk1SXsCLjAHqwRiiQM+XA+sGOG4bGkHaxRSpPtHDTvHx6EAjomgu22BHFBQwDH6EP3TR9peNg0SJKQVXhZoHTEBpLKUDYcxEFsH3dC4FqVWupVqbsmGVHuEJXMJtn7sJUCzQFuTPAYhfoPReYuk9f43wjEL0l3s61cJr5gJZO/VnxuBVw1C4j0nD6kY2MwE5P3C1RaeOrT7KKkpP/lp0RsMOqSWZ6BU1R91eHzqaLxCzPy2lMQjsb1NdxF8Gh2I8UdLTIgK8fUV CwuGzN+f 2De/CNgb25bLFqfhFLQWefuj1Xj4GKkiyMrBpuAtp1PjV0vBesXDmux95Tw9SxPjCqt4a/BVuP8QUyiVV85/l2kFTOVcjX9DxvYIYy26oIfzqy2ve1xNuRdDOv07sV4CavO1ql1tOY/xex7crwOp2dQONYwJEkxApmZoWnerbVz/OYLHfIt9W3Wr2FGESRYmHiMeGxh4XchD1hCqxPz7EwI9glOaeBTA3nW4ZjGD49MEZ1GTWt9awlSXm1CEkN8T0kSRxbHrZ3LoyOpZTYM3sN1Ww5SNC/CgjWF09cO6XUZ8qC1sXdxE9cwHyfTsk5jzOCsv2U9N81ZqddnsPsbbn9+/9W8lgduNd4xfcbiPp5UyOfl9+gmaWGmcnPhmeDCkcq7RB1Yuga6v8Y6GtWRLc5FFy14wOyZcMTvbCycmjfBnXkHkdQ36y1ANIHQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, Jul 02, 2024 at 09:19:10AM +0200, Mateusz Guzik wrote: > On Thu, Jun 27, 2024 at 10:41:35AM +0800, kernel test robot wrote: > > > > > > Hello, > > > > kernel test robot noticed a -33.7% regression of unixbench.throughput on: > > > > > > commit: d042dae6ad74df8a00ee8a3c6b77b53bc9e32f64 ("lockref: speculatively spin waiting for the lock to be released") > > https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master > > > [..] > > | testcase: change | stress-ng: stress-ng.getdent.ops_per_sec -56.5% regression | > [..] > > | testcase: change | stress-ng: stress-ng.handle.ops_per_sec -60.2% regression | > > | test machine | 64 threads 2 sockets Intel(R) Xeon(R) Gold 6346 CPU @ 3.10GHz (Ice Lake) with 256G memory | > [..] > > Both good and bad news is that this particular patch should be dropped > (but struct dentry reordering should stay). There is some gnarly stuff > here, please read the entire thing, even though it may seem rambly at > times. > > To recap there can be a microbenchmark setting with continuous get/put > traffic where only sporadically someone takes the lock. The constant > traffic paired with immediate fallback to locking makes the primitives > degrade to locked operation until benchmark is concluded. This is the > problem the patch tried to address as with dentry reordering I found > myself running into the degraded state by accident a lot. There was > worry lkp machinery would do too, disfiguring their results. > > However, another microbenchmark setting can be mixing explicit spinlock > acquire with lockref get/put. With high enough worker count the lock can > be continuously held, making any spinning waiting for it to be released > actively detrimental to performance. This is what the lkp folks ran > into. > > So I got a 2 socket * 48 cores * 2 threads machine to play with. > > Both unixbench and stress-ng --getdent testcases are heavily mixed with > read-write locking and mutexes, while stress-ng --handle is all about > spinlocks and 1/3rd of it is lockref thus I only benchmarked the last > one. > > At a *low* scale of 24 workers at my usual test box performance improves > immensely (from ~165k ops/s to ~620k ops/s) as locking is avoided plenty > of times. > > At 192 workers on the big box stock kernel achieves ~134k ops/s, while > spinwait drops it to ~42k. As an experiment I added a dedicated waiting > loop with configurable spin count. Doing merely one spin drops the rate > to ~124k ops/s. > > I figured I'm going to write the smallest patch which avoids locked-only > degradation and call it a day -- for example it is possible to check if > the lock is contested thanks to the arch_spin_is_contended macro. > > Then the loop could be: > if (lockref_locked(old)) { > if (locked_contested(old)) > break; > cpu_relax(); > old.lock_count = READ_ONCE(lockref->lock_count); > continue; > } > > > Note how this avoids any additional accesses to the cacheline if the > lock is under traffic. While this would avoid degrading in certain > trivial cases, it very much can still result in lockref get/put being in > the picture *and* only taking locks, so I'm not particularly happy with > it. > > This is where I found that going to 60 workers issuing open or stat on > the same dentry *also* permanently degrades, for example: > > # ./stat2_processes -t 60 > testcase:Same file stat > warmup > min:11293 max:25058 total:1164504 > min:10706 max:12977 total:679839 > min:10303 max:12222 total:643437 > min:8722 max:9846 total:542022 > min:8514 max:9554 total:529731 > min:8349 max:9296 total:519381 > measurement > min:8203 max:9079 total:510768 > min:8122 max:8913 total:504069 > > According to perf top it's all contention on the spinlock and it is the > lockref routines taking it. So something sneaks in a lock/unlock cycle > and there is enough traffic that the default 100 spins with the patch > are not sufficient to wait it out. With the configurable spin count I > found this can stay lockless if I make it wait 1000 spins. Getting the > spin count "wrong" further reduces performance -- for example with 700 > spin limit it was not enough to wait out the lock holders and throughput > went down to ~269k due to time wasted spinning, merely a little more > than half of the degraded state above. > > That is to say messing with lockref itself is probably best avoided as > getting a win depends on being at the right (low) scale or getting > "lucky". > > My worry about locked-only degradation showing up was not justified in > that for the scales used by lkp folks it already occures. > > All in all the thing to do is to find out who takes the spin lock and if > it can be avoided. Perhaps this can be further split so that only spots > depending on the refcount take the d_lock (for example in the --handle > test the lock is taken to iterate the alias list -- it would not mind > refs going up or down). > > It may be (and I'm really just handwaving here) dentries should move > away from lockref altogether in favor of storing flags in the refcount. > Even if both get/put would still have to cmpxchg they would never have > to concern itself with locking of any sort, except when the dentry is > going down. And probably the unref side could merely lock xadd into it. > I recognize this would convert some regular ops in other parts of the > kernel into atomics and it may be undesirable. Something to ponder > nonetheless. > > At the end of the day: > 1. this patch should be dropped, I don't think there is a good > replacement either > 2. the real fix would do something about the dentry lock itself or its > consumers (maybe the lock is taken way more than it should?) Well there is also the option of going full RCU in the fast path, which I mentioned last time around lockref. This would be applicable at least for stat, fstatx, readlink and access syscalls. It would not help the above bench though, but then again I don't think it is doing anything realistic. I'm going to poke around it when I find the time. Maybe ditching lockref would be a good idea regardless if the above pans out.