From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E0155C4828F for ; Fri, 2 Feb 2024 17:20:22 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 4CFCE6B0074; Fri, 2 Feb 2024 12:20:22 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 47F506B0075; Fri, 2 Feb 2024 12:20:22 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 346BB6B007B; Fri, 2 Feb 2024 12:20:22 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 259DE6B0074 for ; Fri, 2 Feb 2024 12:20:22 -0500 (EST) Received: from smtpin13.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id E6B3F1A0300 for ; Fri, 2 Feb 2024 17:20:21 +0000 (UTC) X-FDA: 81747527442.13.448E484 Received: from mail-ua1-f54.google.com (mail-ua1-f54.google.com [209.85.222.54]) by imf07.hostedemail.com (Postfix) with ESMTP id 5CE484001D for ; Fri, 2 Feb 2024 17:20:19 +0000 (UTC) Authentication-Results: imf07.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=R5oAQgoG; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf07.hostedemail.com: domain of elver@google.com designates 209.85.222.54 as permitted sender) smtp.mailfrom=elver@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1706894419; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=RqsxkE+RQ9mB17B3FDM/jEPuA/6kk6RoXwMTIF/XF6s=; b=Bb1+S6Q6VHuSK8PM2QQ4eu4fZLr2kSbkG4Ew23gY2C4tKXE4C8ijmXUZSkjO7HEoUcI5gk B80nAtHqbQ0fh7zxQrxjYCdd7nrAjcVKJyxWeSzgE9OW80Tow1I94vx5WdM9Ya0thZiUdE ApO4WsmwqZ9K0azANSgqUcFkGF4r3eU= ARC-Authentication-Results: i=1; imf07.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=R5oAQgoG; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf07.hostedemail.com: domain of elver@google.com designates 209.85.222.54 as permitted sender) smtp.mailfrom=elver@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1706894419; a=rsa-sha256; cv=none; b=n+w2w7dNpSxw/KcVmrQMU71dykO4m6VsF1aSpmOi5vA4ZHMbWPM1VYdflve/qY7jPKrlZ3 e2n9xfizIA0eg82QlWbU1GXx4l1V9BHH3h6MO7TEKWTluiMeAWPGPDV9a2isrd2j47YNkQ dLeccEB2PjCxf36EZrrYfkbGTErOgjM= Received: by mail-ua1-f54.google.com with SMTP id a1e0cc1a2514c-7d2df857929so958044241.3 for ; Fri, 02 Feb 2024 09:20:19 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1706894418; x=1707499218; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=RqsxkE+RQ9mB17B3FDM/jEPuA/6kk6RoXwMTIF/XF6s=; b=R5oAQgoGar+qiaxDLRcZlpe4AG6diYQD4vQXvC/DSEb2B7qCOvSS3t2TmpcDUF3lbM cGkWJYg/6d8cijhLoXpN7VytuhP0i4d9daD1WXkaJOCixnymwEChusbJ/GINfJjQqTxE 4EqlhFgGpuzXfTxAGEmkfIRgmvigJxZhSQs6dlq5c1nKYeJzdubIGD4krMVdyle95tEf zvqaVRgJ7FK6AT42Q3WNXwnCrDAkxQ7mwetk7Dl22W/mSFxKb8xOIFK4A01Knsf6+KOQ 58EuW4ELw+H5m1KVw5AEbHTnPjnP0j71PB7TWlX8oRBiN7p0ofZZdcZgX5z4myIKxV7y YXLQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1706894418; x=1707499218; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=RqsxkE+RQ9mB17B3FDM/jEPuA/6kk6RoXwMTIF/XF6s=; b=ev8C36k6VKHW/wqAU+a6hSEgZJJBQsaOd7m7AyoXUDaAqL1nEV6hsYwQ6Sx1I8os/Q u4M5EGSytTLY+NHb45gD9QxArfncuAxxHQhRIQQa3ezzQcWhtv0hWjOjp+eXKqEG+Ab6 F/ZOuwu0Fli88yj7k5Ea/b8a/xgLf0xZkijMGTBCnrCzWmr0A39jg3aOS0BimLhAvk+g V5RHDRJUQQhk/nHVvyB0e0gtbrdPP9PdZ7PgXTOjjSy9RSJD6z2gxHfMhOtmqJb5LnHf ossJwamusdHG8XNs6DZz3VDxYxeD7ynHl772r8FGMH3S7W36bjw9xlngwPqddUx054X1 PE+A== X-Gm-Message-State: AOJu0YxlvnWi7uV2FAx1DMvQPlsyREeCLx7VygI5lxsqzFx0mbjMhZ2c z6oxnJnXhGMLxj+TnsEYj9NtwW44qJtyMKVxyE77cikxcwxuulhByxM7gInUik03i5e5GdPRWYk x+VtDa4pusnUPRGk5ZVpMIRUHlBcsjJd6zfag X-Google-Smtp-Source: AGHT+IHV/HYBvP5Z4zFvOZkNvW4mLdWXSTFCu5ldUqxzOCeway9/QRBjvA5uKBlRRsI2TPmu/5FYByFTes89FRFvU00= X-Received: by 2002:a67:ff94:0:b0:46d:774:2fb6 with SMTP id v20-20020a67ff94000000b0046d07742fb6mr1399418vsq.28.1706894418229; Fri, 02 Feb 2024 09:20:18 -0800 (PST) MIME-Version: 1.0 References: In-Reply-To: From: Marco Elver Date: Fri, 2 Feb 2024 18:19:42 +0100 Message-ID: Subject: Re: regression/bisected commit 773688a6cb24b0b3c2ba40354d883348a2befa38 make my system completely unusable under high load To: Mikhail Gavrilov Cc: Andrey Konovalov , glider@google.com, dvyukov@google.com, eugenis@google.com, Oscar Salvador , Vlastimil Babka , Andrew Morton , Linux List Kernel Mailing , Linux Memory Management List Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 5CE484001D X-Rspam-User: X-Rspamd-Server: rspam04 X-Stat-Signature: b954uy43ao9cjwxbzg4mjtc7699i84wa X-HE-Tag: 1706894419-393383 X-HE-Meta: U2FsdGVkX1/7PokRUPBR3EehL4fkfnGzcp6p6f5My8EYyZ+nKv35rPJQFes8mHtiCVMj9y80Ey5blseO2wUwhQ0g9KzlEblnhCuJ6sWox7IkeOy63SbLPJENEaXcxRryoo/ox3lQI+YtbE+QtqJCFtE6qq5O67JW2XPvxr63BFNGDorfbThE86y6l8wIaeExHsUjwxxMDtidlT42BXaBWegy+aNsRKDBbMdkZthWh/eLQsh4llpk2v+5TXPivKv6RhQp7nXVOMHq0SI8eOrBzk6JhDzfIG83p79gSs72telsEO4898fKyewXLpms6XzM/aviMXqUcPiBsPLlal6sxAMN+B6yqDvYoHIK/IpkoCzL41bBwve0eSiu9PJ3JRUzudtjovGrNjvsPsI++vDUcXyIYtD5GiB1NHm2V710P7YcJoaRXpViRmukah9kHH7BCwcIGykZ7rLGwn7ib7iL0ODmX7dzM3nX1Jg7DDS/REehrrxVFdwozxXB4rVFKmAX9FcTw3pFyxWd/xgUsuValEb4aXVY4Hf1wmmtjy4iFIiwSFkgD+7LP4Tkj4+DOSNFPAvoOuzzDfy1W8vtNYRnhjqW+8giatSm7rtttE6foGE20xNvJSdBRnBKoA4EXbr3gtPDyMpHgDLJIt6KT+u+MpkH9GggOlrDyzHniAdcozJq11RwzvnDWYW1PYBe3JPQ+wAlt7//7Y+hTr9AF8NefDOBEUMApBx4urwB1xDX4aoAnk9z7MU7dtAvS2+zMh3uLnyAxwtDlWF3WXiISxlURzp121HkaQyTBIu8PUQeGlcziSywLFoVYDxCFyreDCxMqlnpGqOx/1LEx06VQMHOe22Fjg49xoh7lMYwo4NcW0AapN72wQfifWYkGHGygfLPtOCZA1lxYZSMnjBaO6Mv7TANpNQP4X9/+8meYjZKieAWR9Wqu0vZzD3VoH6IIuD713pdsMbug19+wGg3h24 uqzW+kE+ nPHNJ8O5EPMic+Q97xhyPcQ5J7QivNDtFFH/7mB/GZ6EQkZgIc7d0pfPg56k6SBoNQdXX3xW4Cd0oSJgcPRxyFoQWwQyVuxMMBoEJb4dzLXMQ5h/j1R1b5BxfsbmmDU3xo+PCz7SwMEiB5znJpy/MFrLBqZEsQz9T+A0B8WR/hC8bumca9Yqly+e3rgLky0umS5xJWIMl12OaRTESDeQNcANblSU9FVC4QXmdQYjJRYZBy4YIFcoTOKP98/XpeKmeDZeN4RT7o1HRkdpwRsKAjsyPN/621FymHFWCLLFyRBMPGUXvM2FFJNmhorrs0HfTB/Ok/IVzXQ/43H4pq1FFGCFjHMAUM6259c9l8Wg9k8TS3NakWsL5r66b0dRQ9djhlUDwyg07iuKYT2KqQxOte5uGb3zBNCewFcEEqFUz3laG4gU= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000316, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Fri, 2 Feb 2024 at 17:47, Marco Elver wrote: > > On Fri, 2 Feb 2024 at 17:35, Mikhail Gavrilov > wrote: > > > > On Fri, Feb 2, 2024 at 2:00=E2=80=AFPM Marco Elver w= rote: > > > > > > > Maybe we can try something else? > > > > > > That's strange - the patches at [1] definitely revert the change you > > > bisected to. It's possible there is some other strange side-effect. (= I > > > assume that you are still running all this with a KASAN kernel.) > > > > Yes. build .config not changed between kernel builds. > > > > > Just so I understand it right: > > > You say before commit cc478e0b6bdffd20561e1a07941a65f6c8962cab the > > > game's FPS were good. But that is strange, because at that point we'r= e > > > already doing stackdepot refcounting, i.e. after commit > > > 773688a6cb24b0b3c2ba40354d883348a2befa38 which you reported as the > > > initial performance regression. The patches at [2] fixed that problem= . > > > > > > So now it's unclear to me how the simple change in > > > cc478e0b6bdffd20561e1a07941a65f6c8962cab causes the performance > > > problem, when in fact this is already with KASAN stackdepot > > > refcounting enabled but without the performance fixes from [1] and > > > [2]. > > > > > > [2] https://lore.kernel.org/all/20240118110216.2539519-2-elver@google= .com/ > > > > > > My questions now would be: > > > - What was the game's FPS in the last stable kernel (v6.7)? > > > > [6.7] - 83 FPS - 13060 frames during benchmark. > > > > > - Can you collect another set of performance profiles between good an= d > > > bad? Maybe it would show where the time in the kernel is spent. > > > > Yes, > > please look at [aaa2c9a97c22 perf] and [cc478e0b6bdf perf] > > > > > perf diff perf-git-aaa2c9a97c22af5bf011f6dd8e0538219b45af88.data perf= -git-cc478e0b6bdffd20561e1a07941a65f6c8962cab.data > > No kallsyms or vmlinux with build-id > > de2a040f828394c5ce34802389239c2a0668fcc7 was found > > No kallsyms or vmlinux with build-id > > 33ab1cd545f96f5ffc2a402a4c4cfa647fd727a0 was found > > # Event 'cycles:P' > > # > > # Baseline Delta Abs Shared Object > > Symbol > > # ........ ......... .............................................. > > .......................................................................= ...........................................................................= ................................... > > # > > 48.48% +21.75% [kernel.kallsyms] > > [k] 0xffffffff860065c0 > > 36.13% -16.49% ShadowOfTheTombRaider > > [.] 0x00000000001d7f5e > > 4.43% -2.10% libvulkan_radeon.so > > [.] 0x000000000006b870 > > 3.28% -0.63% libcef.so > > [.] 0x00000000021720e0 > > 1.11% -0.53% libc.so.6 > > [.] syscall > > 0.65% -0.24% libc.so.6 > > [.] __memmove_avx512_unaligned_erms > > 0.31% -0.14% libc.so.6 > > [.] __memset_avx512_unaligned_erms > > 0.26% -0.13% libm.so.6 > > [.] __powf_fma > > 0.20% -0.10% [amdgpu] > > [k] amdgpu_bo_placement_from_domain > > 0.22% -0.09% [amdgpu] > > [k] amdgpu_vram_mgr_compatible > > 0.67% -0.09% armada-drm_dri.so > > [.] 0x00000000000192b4 > > 0.15% -0.08% libc.so.6 > > [.] sem_post@GLIBC_2.2.5 > > 0.16% -0.07% [amdgpu] > > [k] amdgpu_vm_bo_update > > 0.14% -0.07% [amdgpu] > > [k] amdgpu_bo_list_entry_cmp > > 0.13% -0.06% libm.so.6 > > [.] powf@GLIBC_2.2.5 > > 0.14% -0.06% libMangoHud.so > > [.] 0x000000000001c4c0 > > 0.10% -0.06% libc.so.6 > > [.] __futex_abstimed_wait_common > > 0.19% -0.05% libGLESv2.so > > [.] 0x0000000000160a11 > > 0.07% -0.04% libc.so.6 > > [.] __new_sem_wait_slow64.constprop.0 > > 0.10% -0.04% radeonsi_dri.so > > [.] 0x0000000000019454 > > 0.05% -0.03% [amdgpu] > > [k] optc1_get_position > > 0.05% -0.03% libc.so.6 > > [.] sem_wait@@GLIBC_2.34 > > 0.22% -0.02% [vdso] > > [.] 0x00000000000005a0 > > 0.10% -0.02% libc.so.6 > > [.] __memcmp_evex_movbe > > +0.02% [JIT] tid 8383 > > [.] 0x00007f2de0052823 > > > > > > > - Could it be an inconclusive bisection? > > > > I checked twice: > > [6.7] - 83 FPS > > [aaa2c9a97c22] - 111 FPS > > [cc478e0b6bdf] - 64 FPS > > [6.8-rc2 with patches] - 82 FPS > > > > > > [6.7] https://i.postimg.cc/15yyzZBr/v6-7.png > > [6.7 perf] https://mega.nz/file/QwJ3hbob#RslLFVYgz1SWMcPR3eF9uEpFuqxdgk= wXSatWts-1wVA > > > > [aaa2c9a97c22] https://i.postimg.cc/Sxv4VYhg/git-aaa2c9a97c22af5bf011f6= dd8e0538219b45af88.png > > [aaa2c9a97c22 perf] > > https://mega.nz/file/dwQxha4J#2_nBF6uNzY11VX-T-Lr_-60WIMrbl1YEvPgY4CuXq= Ec > > > > [cc478e0b6bdf] https://i.postimg.cc/W3cQfMfw/git-cc478e0b6bdffd20561e1a= 07941a65f6c8962cab.png > > [cc478e0b6bdf perf] > > https://mega.nz/file/hl5kwLTC#_4Fg1KBXCnQ-8OElY7EYmPOoDG6ZeZYnKFjamWpkl= Ww > > > > [6.8-rc2 with patches] https://i.postimg.cc/26dPpVsR/v6-8-rc2-with-patc= hes.png > > [6.8-rc2 with patches perf] > > https://mega.nz/file/NxgTAb4L#0KO_WU-svpDw60Y3148RZhELPcUtFg3_VCDzJqSyz= 34 > > Thanks a lot for these results. There's definitely something strange > going - I'll try to have a detailed look some time next week. > > In the meantime, this is clear: there does not seem to be a regression > between 6.7 and 6.8-rc with the patches, which is what I was > expecting. The fact that aaa2c9a97c22 is so much better could indicate > that until cc478e0b6bdf there was either a bug which turned something > into a no-op - or, the memsets() were acting as some kind of > prefetching hint to the CPU, which in turn caused a significant > reduction in cache misses. I think at this point we're not trying to > fix a regression, because we're on par with 6.7, but trying to make > sense of this information to optimize the code properly without luck > (but not sure if feasible). Hrm.... Your config has lockdep enabled, right? Because cc478e0b6bdf was fixing an issue with lockdep, does your kernel before that commit show some lockdep errors? Because if lockdep encounters an error it usually turns itself off right away, which would explain the improved performance. :-)