From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 396C4C48291 for ; Fri, 2 Feb 2024 16:48:21 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 87F0C6B007B; Fri, 2 Feb 2024 11:48:20 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 82E246B007D; Fri, 2 Feb 2024 11:48:20 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6A7F86B0080; Fri, 2 Feb 2024 11:48:20 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 568B46B007B for ; Fri, 2 Feb 2024 11:48:20 -0500 (EST) Received: from smtpin15.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 9730C1C19B0 for ; Fri, 2 Feb 2024 16:48:19 +0000 (UTC) X-FDA: 81747446718.15.8F94823 Received: from mail-vs1-f45.google.com (mail-vs1-f45.google.com [209.85.217.45]) by imf18.hostedemail.com (Postfix) with ESMTP id BC1DF1C001A for ; Fri, 2 Feb 2024 16:48:16 +0000 (UTC) Authentication-Results: imf18.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=mCyiF8wF; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf18.hostedemail.com: domain of elver@google.com designates 209.85.217.45 as permitted sender) smtp.mailfrom=elver@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1706892496; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=rqIRLfLzGkEvSBmQJbYiIflgqcbJlUJTWpbIXyETi90=; b=l+WoMS/2VHz1so18rJX4qMmI4IoXsRd/CqKfst4PfZI3c+9bm9/hOJu17z13zzCNkOfrVE 15AnTktVR8+t+/G/x5HiIh8myZgDDDAyqVl5BVxebIDJt91YWv2CzvLnAlisyELcdf5iUp 1Y2E5KXpFouQvftkxgKqM9/0LZQmf/Y= ARC-Authentication-Results: i=1; imf18.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=mCyiF8wF; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf18.hostedemail.com: domain of elver@google.com designates 209.85.217.45 as permitted sender) smtp.mailfrom=elver@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1706892496; a=rsa-sha256; cv=none; b=2TW4cqS6XGopwER58yZ1FuX9mxi8c7aHN7trSGq4c3URIaEI+YZUY01LFZzMVXyoahDIr3 n82BSS2H+jO1b9eHwgtXrCLXWcHwc9lPLGsE2o+VPj/Ecfcx+iDhM38vA42I6xfJxj2zHS uRRiIV2KkZtAfdZObRXUBRTHtdPVWoI= Received: by mail-vs1-f45.google.com with SMTP id ada2fe7eead31-46d091f925dso124112137.1 for ; Fri, 02 Feb 2024 08:48:16 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1706892496; x=1707497296; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=rqIRLfLzGkEvSBmQJbYiIflgqcbJlUJTWpbIXyETi90=; b=mCyiF8wF0lbCEmZ4a8g3Qr+Ou4prW4ufcEUY8PClZnmAbpTwF2VlnTQkbdI5IudiHS Hd5NMlxuHJdkJjjTT9yTK+31+3wA4zUyAsrTpQg2kNM+CAxbLPXLNPphzs81qVulcSII RsdUZk0Dhu3OK+8ezQfSf+0ihQE+OacymZ8MtVW7miTSkKZVwVIous/1OG/+4LdBjjN8 Hmn/X7IxH97gIFL+WcuMAnp79A0HEYYAljgLQgWNZBwF10BImRV8PRcizcIlZRHrF/WU gkPLA0A9o3RiCr7GKp1H6/k84r3WSTw5hVV3BZPQRv67NISiKO0TH4KgB5XA1JXlDwA6 MyAQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1706892496; x=1707497296; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=rqIRLfLzGkEvSBmQJbYiIflgqcbJlUJTWpbIXyETi90=; b=orbtWPOXiDIaXTt1Zies6ea7edFC/rctJVulbV16Fm/kLr8P8L4eK/x7hE5dOXdJmX MYb/63vaCXIVhwL407EiIppo3iraFqq/KqMdRdlDSbGnMyOZaa8fFA4KSPDWjSpXy9Y0 AfLA6tG1q38L92689N/TKIHETsbjIfCIYklyCko8XZ31RGgjj9CBOgP/FIjT9aIsrXLF slMj2yXZAJ5bKv4mLpnKlVw6Zl8J/4Q+jaS+Qy7GpfHHNF5F8/l4K4iO0gNJo/c9fZNE DOYnDiXgsfQFsQHGBGBH5gCaJvdWJtaxmdAx0Pmu0i3VCHLNUzEdG91DEBE8bEe0PSno b31Q== X-Gm-Message-State: AOJu0YzVMCN1jQB5feY069uerSxe4DnU5SdIJPyeVXj2vKb/Yuf2kJzl raTznIdh19lGXncvK6x/fPbTHAHZu+AeP8pPSpPzoVeIuzNyz6/T0ocK1QK4EcXsp/3XRLGU1Ye HagCFu+zZ3sptykt1jz1cqlKG7sJV0IcDdutc X-Google-Smtp-Source: AGHT+IEZoUGUPJ/ZrY4qFsoWM/im/x6E8Fhz5YU/9ZvJHc7GpUSZj7wV/izdUeR2yLyxtY782yr1npSu7JhRosrHU+A= X-Received: by 2002:a05:6102:2338:b0:46c:c290:e617 with SMTP id b24-20020a056102233800b0046cc290e617mr5469877vsa.23.1706892495482; Fri, 02 Feb 2024 08:48:15 -0800 (PST) MIME-Version: 1.0 References: In-Reply-To: From: Marco Elver Date: Fri, 2 Feb 2024 17:47:38 +0100 Message-ID: Subject: Re: regression/bisected commit 773688a6cb24b0b3c2ba40354d883348a2befa38 make my system completely unusable under high load To: Mikhail Gavrilov Cc: Andrey Konovalov , glider@google.com, dvyukov@google.com, eugenis@google.com, Oscar Salvador , Vlastimil Babka , Andrew Morton , Linux List Kernel Mailing , Linux Memory Management List Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: BC1DF1C001A X-Stat-Signature: uqj8jwi5bm7wypdjaaeofjr8de5wyt4j X-Rspam-User: X-HE-Tag: 1706892496-878367 X-HE-Meta: U2FsdGVkX1+RiNs0drwWIeO8AS+XLp3vBVP6MY/sERHWO2cPvx/ZbrHRcUynp5ATG6985K5G8r+ZAgVrtCFWQFvLQ5ET2MWV3EXMzkZBUbq1koouUfoIncZu1gWP92kppbIlZdZMEKOF9XBS14aIZ4N+roxxGsF5xhQQb6Lj9EbM06/5MqbwraNYXGwPwAhuoQUqMc4AGhCiV1PPWlkzVCC9tq63JeOKZeZG3PtEO8GGzKGUJ9fee4oM9k6wI56dZR1xdFYi02jlLdXi9AKUs97W3SE9+nmriOOEXYA1+N+BPgjnwOu8D/M7CG+Q2gJtVRdH9Y5Ulp6bBwwOscOiigZXhObOi3MTh1t590GQ+RDabPGQX6MGzE5B4DCxNa6/D5eSiQjdTa1nbPFjeDsRaUiVyXTJM7nTxqAnV/a15Rw/b026p54NGRscoGteOtYzwG3Z+geugZpRncCgopE+OYGDJd2dL50yJKJN6XJp5d0yS2BwxgwcFbzl1MA8EtsEjsicZnsjYzzobgyS2nHbUuKnCA19c2lc8mlvIUaOvk+Jgbnkr8YI2PUYVJRQ9MJl6VYQ6wyBXb+9yG2P337unYILiNWgbqHxBhCf5ObU9icSlO0taAJmNpDY+F1UhaYg7igQmw7WnijrFIM0Y2YxrdpnHnKhjh2GbNuTSBTMCLmjmfMtGX6qET3Y82Cudr8AbxaMcWFSYxYFs1vTOiafmv+K2FN3Ey/OjiC8pDPo6gThs9pf4ChYA5TXfpqxuhH1fL97U0i1aMiuoAXsAGp7FwDVfL4Wk4qG8K/rIjxCe2ca6DrvL1hWcnq8F0tK463UmmagkV4GkU2/JJg/MWW9dRqdF25KQjklqQ+V98RqBGiyg6lbN1vZ+FMIDPK7DetZQJl9gdAvHXwcc6gk+rf7fjfi0Ou5htVS01a3NWc0r08i+RvZYhBN5ng1cr4VbQ9tjzMmTf67slAPTkDcffN H30KnoBA nU2sv/2Mqn7onEuNXk0nrH0eHb0UO32SmJIWv15/vZ3peZ29+1G+WlMRIB5N0LBRlyF7xDTEIh+ngwTqXrTwbxlZmAIUE51qFSlW2dUNkvgKXZyrv8taoULURB0IVGtoChE+9SiMNttCKPW0QWClqXt1ZxEwOj8AEszDAB1+p+18Q7q0h8+SGJy/tbTkiTYecUzU6ZERrtmVZGpQEABlMuFJcI8U0xL19vzlbMA1GdEy9r4IAGvF/ppIShWa0AQI3aC75/KNEn7FSWjUR/Gz8k6JcdHY5dPqpD2xDTMuQCOVafdI2fc7wRhPAjTG6wwmKGWosxcjEPTi3OJ7OV7aw71lkI7uW8A/Q+niKmaAba79VyJj5xdjq3QcOgfzH3/OwLopII7u2C/a0DTf4hY2qJaDAGtUXez0DlnY5FKXBoh0O59JE16KRpGPeD/XAMR6thJ7+Eiwtaqwt2xKrdB3EFM+/jPeqaDftk2972QY47mbytV02MAqxZefUIT98WvHMSlnQbO9F5JmLcuniwYfgyOzJHZYvux/F83rPrBTMTF/vCJk5UBNFyZ32H31RXDWdujebuZb0EA3Wy10YsMnNKKyFxkk8zEhmWFk1BLXh06aPCdVjSf0dAejtG+0HZCe+yfxUdWF26SXoXZeaGGHJX+KJawTp34ya3nEDJofd4eif/a4/g5VFdXITs1xylcRFbIUJna+i2/BZUUJ5Oy7xUcTYx8+COAOx2d05lgO2dGtVLypuj0hwfJU7mITFHz4WrW64W929ziKGk2Er+oIhJt5ZwF2eVHaEgjsXHVm+JNtiXlUhavxYwf8NVPhaB6P1lzop2KlnA5AJPMj7L3s4yh18UIkrXC+tt2ozLQo8WN2rTMcPux43HUrFPrT6akJV0qAE7bPe13o8cZ7Yi/TQLUsCQvAtBQeCDNaCKnxiWukYBSuTIl1ueOF8uMyHrB1bQirI8K5j10Uc9yU= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000113, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Fri, 2 Feb 2024 at 17:35, Mikhail Gavrilov wrote: > > On Fri, Feb 2, 2024 at 2:00=E2=80=AFPM Marco Elver wro= te: > > > > > Maybe we can try something else? > > > > That's strange - the patches at [1] definitely revert the change you > > bisected to. It's possible there is some other strange side-effect. (I > > assume that you are still running all this with a KASAN kernel.) > > Yes. build .config not changed between kernel builds. > > > Just so I understand it right: > > You say before commit cc478e0b6bdffd20561e1a07941a65f6c8962cab the > > game's FPS were good. But that is strange, because at that point we're > > already doing stackdepot refcounting, i.e. after commit > > 773688a6cb24b0b3c2ba40354d883348a2befa38 which you reported as the > > initial performance regression. The patches at [2] fixed that problem. > > > > So now it's unclear to me how the simple change in > > cc478e0b6bdffd20561e1a07941a65f6c8962cab causes the performance > > problem, when in fact this is already with KASAN stackdepot > > refcounting enabled but without the performance fixes from [1] and > > [2]. > > > > [2] https://lore.kernel.org/all/20240118110216.2539519-2-elver@google.c= om/ > > > > My questions now would be: > > - What was the game's FPS in the last stable kernel (v6.7)? > > [6.7] - 83 FPS - 13060 frames during benchmark. > > > - Can you collect another set of performance profiles between good and > > bad? Maybe it would show where the time in the kernel is spent. > > Yes, > please look at [aaa2c9a97c22 perf] and [cc478e0b6bdf perf] > > > perf diff perf-git-aaa2c9a97c22af5bf011f6dd8e0538219b45af88.data perf-g= it-cc478e0b6bdffd20561e1a07941a65f6c8962cab.data > No kallsyms or vmlinux with build-id > de2a040f828394c5ce34802389239c2a0668fcc7 was found > No kallsyms or vmlinux with build-id > 33ab1cd545f96f5ffc2a402a4c4cfa647fd727a0 was found > # Event 'cycles:P' > # > # Baseline Delta Abs Shared Object > Symbol > # ........ ......... .............................................. > .........................................................................= ...........................................................................= ................................. > # > 48.48% +21.75% [kernel.kallsyms] > [k] 0xffffffff860065c0 > 36.13% -16.49% ShadowOfTheTombRaider > [.] 0x00000000001d7f5e > 4.43% -2.10% libvulkan_radeon.so > [.] 0x000000000006b870 > 3.28% -0.63% libcef.so > [.] 0x00000000021720e0 > 1.11% -0.53% libc.so.6 > [.] syscall > 0.65% -0.24% libc.so.6 > [.] __memmove_avx512_unaligned_erms > 0.31% -0.14% libc.so.6 > [.] __memset_avx512_unaligned_erms > 0.26% -0.13% libm.so.6 > [.] __powf_fma > 0.20% -0.10% [amdgpu] > [k] amdgpu_bo_placement_from_domain > 0.22% -0.09% [amdgpu] > [k] amdgpu_vram_mgr_compatible > 0.67% -0.09% armada-drm_dri.so > [.] 0x00000000000192b4 > 0.15% -0.08% libc.so.6 > [.] sem_post@GLIBC_2.2.5 > 0.16% -0.07% [amdgpu] > [k] amdgpu_vm_bo_update > 0.14% -0.07% [amdgpu] > [k] amdgpu_bo_list_entry_cmp > 0.13% -0.06% libm.so.6 > [.] powf@GLIBC_2.2.5 > 0.14% -0.06% libMangoHud.so > [.] 0x000000000001c4c0 > 0.10% -0.06% libc.so.6 > [.] __futex_abstimed_wait_common > 0.19% -0.05% libGLESv2.so > [.] 0x0000000000160a11 > 0.07% -0.04% libc.so.6 > [.] __new_sem_wait_slow64.constprop.0 > 0.10% -0.04% radeonsi_dri.so > [.] 0x0000000000019454 > 0.05% -0.03% [amdgpu] > [k] optc1_get_position > 0.05% -0.03% libc.so.6 > [.] sem_wait@@GLIBC_2.34 > 0.22% -0.02% [vdso] > [.] 0x00000000000005a0 > 0.10% -0.02% libc.so.6 > [.] __memcmp_evex_movbe > +0.02% [JIT] tid 8383 > [.] 0x00007f2de0052823 > > > > - Could it be an inconclusive bisection? > > I checked twice: > [6.7] - 83 FPS > [aaa2c9a97c22] - 111 FPS > [cc478e0b6bdf] - 64 FPS > [6.8-rc2 with patches] - 82 FPS > > > [6.7] https://i.postimg.cc/15yyzZBr/v6-7.png > [6.7 perf] https://mega.nz/file/QwJ3hbob#RslLFVYgz1SWMcPR3eF9uEpFuqxdgkwX= SatWts-1wVA > > [aaa2c9a97c22] https://i.postimg.cc/Sxv4VYhg/git-aaa2c9a97c22af5bf011f6dd= 8e0538219b45af88.png > [aaa2c9a97c22 perf] > https://mega.nz/file/dwQxha4J#2_nBF6uNzY11VX-T-Lr_-60WIMrbl1YEvPgY4CuXqEc > > [cc478e0b6bdf] https://i.postimg.cc/W3cQfMfw/git-cc478e0b6bdffd20561e1a07= 941a65f6c8962cab.png > [cc478e0b6bdf perf] > https://mega.nz/file/hl5kwLTC#_4Fg1KBXCnQ-8OElY7EYmPOoDG6ZeZYnKFjamWpklWw > > [6.8-rc2 with patches] https://i.postimg.cc/26dPpVsR/v6-8-rc2-with-patche= s.png > [6.8-rc2 with patches perf] > https://mega.nz/file/NxgTAb4L#0KO_WU-svpDw60Y3148RZhELPcUtFg3_VCDzJqSyz34 Thanks a lot for these results. There's definitely something strange going - I'll try to have a detailed look some time next week. In the meantime, this is clear: there does not seem to be a regression between 6.7 and 6.8-rc with the patches, which is what I was expecting. The fact that aaa2c9a97c22 is so much better could indicate that until cc478e0b6bdf there was either a bug which turned something into a no-op - or, the memsets() were acting as some kind of prefetching hint to the CPU, which in turn caused a significant reduction in cache misses. I think at this point we're not trying to fix a regression, because we're on par with 6.7, but trying to make sense of this information to optimize the code properly without luck (but not sure if feasible). Hrm....