From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 64C15C021B2 for ; Sat, 22 Feb 2025 11:36:50 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E86536B007B; Sat, 22 Feb 2025 06:36:49 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id E369E6B0083; Sat, 22 Feb 2025 06:36:49 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id CFD416B0085; Sat, 22 Feb 2025 06:36:49 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id B24B96B007B for ; Sat, 22 Feb 2025 06:36:49 -0500 (EST) Received: from smtpin07.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 5C39DA2A39 for ; Sat, 22 Feb 2025 11:36:49 +0000 (UTC) X-FDA: 83147378538.07.48BDFA2 Received: from prime.voidband.net (prime.voidband.net [199.247.17.104]) by imf14.hostedemail.com (Postfix) with ESMTP id BB634100008 for ; Sat, 22 Feb 2025 11:36:47 +0000 (UTC) Authentication-Results: imf14.hostedemail.com; dkim=pass header.d=natalenko.name header.s=dkim-20170712 header.b=CgW4dc8x; spf=pass (imf14.hostedemail.com: domain of oleksandr@natalenko.name designates 199.247.17.104 as permitted sender) smtp.mailfrom=oleksandr@natalenko.name; dmarc=pass (policy=reject) header.from=natalenko.name ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1740224207; a=rsa-sha256; cv=none; b=obXpf/wruUtrQ7zddcJJ/xZCSCXaI0495hukKBpxkYXuE1Jrk5eNREocPGYljBnnkAEuZC T0lAFApqy8r103Fd09Bj/EtF+n1S66SoEUMsNn7e/gNztkjzuKOM3Bp+b2CcK9tQ7WMMDV WxF/jCOLepl0yB24GPLpfJ5HT5PBwxc= ARC-Authentication-Results: i=1; imf14.hostedemail.com; dkim=pass header.d=natalenko.name header.s=dkim-20170712 header.b=CgW4dc8x; spf=pass (imf14.hostedemail.com: domain of oleksandr@natalenko.name designates 199.247.17.104 as permitted sender) smtp.mailfrom=oleksandr@natalenko.name; dmarc=pass (policy=reject) header.from=natalenko.name ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1740224207; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=yMowdD31+N5Yry4py2pgTOlFubTMO0/Sn3pXYQDcq1M=; b=aewhJJ7tFOy+kAcJ1Fgq++gVKI3RKM0TyTEYEFzuLuoHvU8M3YzFY3XgawM4fdRrVi2h9p 8YszUxkW0F+RpsIRG39o69Omr7Y9zj/+nrWSDyoi+9QIUcnj5YgbaFrGC/+XKj3KwdcVbW c0aM5UJdoAJdhg2s6U4apiZoQN1mwUg= Received: from spock.localnet (unknown [212.20.115.26]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange x25519 server-signature ECDSA (prime256v1) server-digest SHA256) (No client certificate requested) by prime.voidband.net (Postfix) with ESMTPSA id B590E6168430; Sat, 22 Feb 2025 12:36:44 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=natalenko.name; s=dkim-20170712; t=1740224205; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=yMowdD31+N5Yry4py2pgTOlFubTMO0/Sn3pXYQDcq1M=; b=CgW4dc8x19IJOXZgTudrQ6td+3EWFR6m9DcjmPi97BBBbKGTjdLOv1zRpJEmkyQFcdsb6/ qmeLJG4h8sUgUe2DNMAllSRiEbAVpVFnrgAWse2A0uVuFyD3eNrRw7iNTlBBqgZesdrKj6 vNF4k/Qu1CkhKy40VAl75Q/8dMRl4Po= From: Oleksandr Natalenko To: x86@kernel.org, Rik van Riel Cc: linux-kernel@vger.kernel.org, bp@alien8.de, peterz@infradead.org, dave.hansen@linux.intel.com, zhengqi.arch@bytedance.com, nadav.amit@gmail.com, thomas.lendacky@amd.com, kernel-team@meta.com, linux-mm@kvack.org, akpm@linux-foundation.org, jackmanb@google.com, jannh@google.com, mhklinux@outlook.com, andrew.cooper3@citrix.com, Manali.Shukla@amd.com Subject: Re: [PATCH v12 00/16] AMD broadcast TLB invalidation Date: Sat, 22 Feb 2025 12:36:32 +0100 Message-ID: <4630159.LvFx2qVVIh@natalenko.name> In-Reply-To: <5861243.DvuYhMxLoT@natalenko.name> References: <20250221005345.2156760-1-riel@surriel.com> <5861243.DvuYhMxLoT@natalenko.name> MIME-Version: 1.0 Content-Type: multipart/signed; boundary="nextPart2356740.ElGaqSPkdT"; micalg="pgp-sha256"; protocol="application/pgp-signature" X-Stat-Signature: j1hr7s5mkrsqd37o5cn5bjyt9bq1dksq X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: BB634100008 X-Rspam-User: X-HE-Tag: 1740224207-780599 X-HE-Meta: U2FsdGVkX18I0ijmTpFHkqMnYtmZ/FbPClXfT4xS4VuRZZeJy7YFKixV8Q5M9GYRAebXICF/qWkO52REevkabyNZYuSmNLYl1/RjSxEp4Xr6g0W2+Kvc03dY6b4q40dgYFqFbQ2NMI3+spBlqEyG/bWFzUfA+E8m31leREb9njR2gSwr3iG6uZbOYOPm7gnoG0tRpyHF1Xd7+ZK2W9+kx2WJ2sMrvOoklri71XA+Df5uhjkVCcVTWahFGRielLVI/NhPZ9z+vu5+92ChIvdsvOnU6PL00XQQq4VtnNF55nYJG+H7+YtB1zDYr4tuP/AAOuhy1PjfHV/z+RvzQCZTtGa2IS8nlDAoYle0rdRQa7ikk0KHYyWdcDkmZxn6PPAvcqL5P9rGlJzwyqanbnBXObyw/vs7QqSQaLGXvpnGd0+4Ga6/3SRYNkA4l9YE3nv6uFescsg3/jOnYTQc9QbTDdPY48LpcfN+AvaMCAUfOORbkSfxLXB0yG6dTgazRKR4fXsrTpb82jGv1LStPH66nCFsM8qRbzF/etSQbelYp7BMdTkWV9hZfaE5g0CwuNUfyvOoFtzUkY+Zp3bu9hlFnPYyQnNmqjSxffGM2VexYsHdwkyH7/hxXWmDM5+AfCLgjg8R1Y2neZdjIR6uCF63gE2TpvBephZe4kP17tDJBoe9THvzKxum/7K76sJQ/WTuS2L1mxSSzHZ2bRj6Z8hyyoc4iwP2LHzQ/9PcqfMFTKrMR2/Mo7z1EEPLuzLU3tkCSjYAl9y1JdOXlxkWs40F1WNEWaPNTmizgMmM3xA/r5xtOtiICiMkSE0y8DMjUXbKCdLcXCmVgTieTaTUf3EHl4MlQEHg8J4z/jwb5S/sl+lqH029VfAzS23wdPaZ0OqBooqr2asGyaEw83R2NVoz6L5Y8RvbRxp+IDcVPiVj5nqi+kv2Zb+JGKGW0dijBT6BDaDVBmcM+uVJyZoWaCe AdRHe21n dMJTmLKwJ5fqw8X8ysDLAzhfIQehVRmYNKOhwQP1v13W+OKxlsBVNveRGBnl+TR4rp/r0pASS6a4d3GPkwVA8DbWCnKWxYvhVj2UG9mvd9jfL8yzmNqM+qqcoQQNyeDnr7S2zOuKPD9jbyChpK4ia+WMVwMcKmaZeCxUoEcwncQKH6sldXlxO9w+JYY+HCVzjrv9XgtwKfyyDfehVcDwGMDI9/LB/KZ3BiBOAxs7e0iX5QBGTUguSRkFEf6sD8dl1oz2EySTFB70CYWX1moYBbZdZVpxuQ1so8R8UQr32hmJTparSyfi7yJzCfWsAGAErt3TeapOf7zaHfngRrEfMilw2mxqZJVd1uciHuSeObESPYmLfBuI90xhu+E3fW9aJ6t6eqJZVpKjOhJzXchoGLyrxfnlm3AptEIBST06mYNo7FCPWu2Db1fwyO52BD1EQYSUvQL6OlqvYxyhOQII56U6IMZNakn+CNk1iglUBktgQJFxd6Zkw5D4FWjTjVP3AnmQW+uVf/m4B3CFCqaZ5XaMZJm2K+br/YAqUFB/gA20FHQvAksshCHpt/9A9OytuQ2VCvc1+aO7YfMsfcE923EaGhrMQs1m6gsYG2YYS1crEOLCaXkgDl8BGFWTZZEo0BaObCWRmqIJ1/LIBRIuxzjqyUA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: --nextPart2356740.ElGaqSPkdT Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8"; protected-headers="v1" From: Oleksandr Natalenko To: x86@kernel.org, Rik van Riel Subject: Re: [PATCH v12 00/16] AMD broadcast TLB invalidation Date: Sat, 22 Feb 2025 12:36:32 +0100 Message-ID: <4630159.LvFx2qVVIh@natalenko.name> In-Reply-To: <5861243.DvuYhMxLoT@natalenko.name> MIME-Version: 1.0 On sobota 22. =C3=BAnora 2025 12:29:54, st=C5=99edoevropsk=C3=BD standardn= =C3=AD =C4=8Das Oleksandr Natalenko wrote: > Hello. >=20 > On p=C3=A1tek 21. =C3=BAnora 2025 1:52:59, st=C5=99edoevropsk=C3=BD stand= ardn=C3=AD =C4=8Das Rik van Riel wrote: > > Add support for broadcast TLB invalidation using AMD's INVLPGB instruct= ion. > >=20 > > This allows the kernel to invalidate TLB entries on remote CPUs without > > needing to send IPIs, without having to wait for remote CPUs to handle > > those interrupts, and with less interruption to what was running on > > those CPUs. > >=20 > > Because x86 PCID space is limited, and there are some very large > > systems out there, broadcast TLB invalidation is only used for > > processes that are active on 3 or more CPUs, with the threshold > > being gradually increased the more the PCID space gets exhausted. > >=20 > > Combined with the removal of unnecessary lru_add_drain calls > > (see https://lkml.org/lkml/2024/12/19/1388) this results in a > > nice performance boost for the will-it-scale tlb_flush2_threads > > test on an AMD Milan system with 36 cores: > >=20 > > - vanilla kernel: 527k loops/second > > - lru_add_drain removal: 731k loops/second > > - only INVLPGB: 527k loops/second > > - lru_add_drain + INVLPGB: 1157k loops/second > >=20 > > Profiling with only the INVLPGB changes showed while > > TLB invalidation went down from 40% of the total CPU > > time to only around 4% of CPU time, the contention > > simply moved to the LRU lock. > >=20 > > Fixing both at the same time about doubles the > > number of iterations per second from this case. > >=20 > > Some numbers closer to real world performance > > can be found at Phoronix, thanks to Michael: > >=20 > > https://www.phoronix.com/news/AMD-INVLPGB-Linux-Benefits > >=20 > > My current plan is to implement support for Intel's RAR > > (Remote Action Request) TLB flushing in a follow-up series, > > after this thing has been merged into -tip. Making things > > any larger would just be unwieldy for reviewers. > >=20 > > v12: > > - make sure "nopcid" command line option turns off invlpgb (Brendan) > > - add "noinvlpgb" kernel command line option > > - split out kernel TLB flushing differently (Dave & Yosry) > > - split up the patch that does invlpgb flushing for user processes (Da= ve) > > - clean up get_flush_tlb_info (Boris) > > - move invlpgb_count_max initialization to get_cpu_cap (Boris) > > - bunch more comments as requested >=20 > Somehow, this iteration breaks resume from S3. I can see it even in a QEM= U VM: Can also reproduce this by simply offlining/onlining a CPU via `/sys/device= s/system/cpu/cpuX/online`. >=20 > ``` > [ 24.373391] ACPI: PM: Low-level resume complete > [ 24.373929] ACPI: PM: Restoring platform NVS memory > [ 24.375024] Enabling non-boot CPUs ... > [ 24.375777] smpboot: Booting Node 0 Processor 1 APIC 0x1 > [ 24.376463] BUG: unable to handle page fault for address: ffffffffa3ba= 4d60 > [ 24.377383] #PF: supervisor write access in kernel mode > [ 24.377912] #PF: error_code(0x0003) - permissions violation > [ 24.378413] PGD 25427067 P4D 25427067 PUD 25428063 PMD 8000000024c001a1 > [ 24.379020] Oops: Oops: 0003 [#1] PREEMPT SMP NOPTI > [ 24.379503] CPU: 1 UID: 0 PID: 0 Comm: swapper/1 Kdump: loaded Not tai= nted 6.14.0-pf0 #1 161e4891fb5044b2d7438cd1852eeaac0cdffab5 > [ 24.380650] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS u= nknown 02/02/2022 > [ 24.381400] RIP: 0010:get_cpu_cap+0x39b/0x4f0 > [ 24.381810] Code: 08 c7 44 24 08 00 00 00 00 48 8d 4c 24 0c e8 3c 00 0= 4 00 90 8b 44 24 04 89 43 64 0f b7 44 24 0c 83 c0 01 81 7b 24 09 00 00 80 <= 66> 89 05 0e ab 8b 01 0f 86 18 fd ff ff c7 44 24 14 00 00 00 00 4c > [ 24.383629] RSP: 0000:ffffafbec00efe70 EFLAGS: 00010012 > [ 24.384155] RAX: 0000000000000001 RBX: ffff8b3fbcb19020 RCX: 000000000= 0001001 > [ 24.384862] RDX: 0000000000000000 RSI: ffffafbec00efe74 RDI: ffffafbec= 00efe78 > [ 24.385603] RBP: ffffafbec00efe88 R08: ffffafbec00efe70 R09: ffffafbec= 00efe7c > [ 24.386318] R10: 0000000000002430 R11: ffff8b3fa5428000 R12: ffffafbec= 00efe8c > [ 24.387014] R13: ffffafbec00efe84 R14: ffffafbec00efe80 R15: ffffafbec= 00efe70 > [ 24.387713] FS: 0000000000000000(0000) GS:ffff8b3fbcb00000(0000) knlG= S:0000000000000000 > [ 24.388502] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 24.389074] CR2: ffffffffa3ba4d60 CR3: 0000000025422000 CR4: 000000000= 0350ef0 > [ 24.389769] Call Trace: > [ 24.390020] > [ 24.392234] identify_cpu+0xd4/0x890 > [ 24.392593] identify_secondary_cpu+0x12/0x40 > [ 24.393032] smp_store_cpu_info+0x49/0x60 > [ 24.393430] start_secondary+0x7f/0x140 > [ 24.393810] common_startup_64+0x13e/0x141 > [ 24.394218] >=20 > $ scripts/faddr2line arch/x86/kernel/cpu/common.o get_cpu_cap+0x39b > get_cpu_cap+0x39b/0x500: > get_cpu_cap at =E2=80=A6/arch/x86/kernel/cpu/common.c:1063 >=20 > 1060 if (c->extended_cpuid_level >=3D 0x80000008) { > 1061 cpuid(0x80000008, &eax, &ebx, &ecx, &edx); > 1062 c->x86_capability[CPUID_8000_0008_EBX] =3D ebx; > 1063 invlpgb_count_max =3D (edx & 0xffff) + 1; > 1064 } > ``` >=20 > Any idea what I'm looking at? >=20 > Thank you. >=20 > > v11: > > - resolve conflict with CONFIG_PT_RECLAIM code > > - a few more cleanups (Peter, Brendan, Nadav) > > v10: > > - simplify partial pages with min(nr, 1) in the invlpgb loop (Peter) > > - document x86 paravirt, AMD invlpgb, and ARM64 flush without IPI (Bre= ndan) > > - remove IS_ENABLED(CONFIG_X86_BROADCAST_TLB_FLUSH) (Brendan) > > - various cleanups (Brendan) > > v9: > > - print warning when start or end address was rounded (Peter) > > - in the reclaim code, tlbsync at context switch time (Peter) > > - fix !CONFIG_CPU_SUP_AMD compile error in arch_tlbbatch_add_pending (= Jan) > > v8: > > - round start & end to handle non-page-aligned callers (Steven & Jan) > > - fix up changelog & add tested-by tags (Manali) > > v7: > > - a few small code cleanups (Nadav) > > - fix spurious VM_WARN_ON_ONCE in mm_global_asid > > - code simplifications & better barriers (Peter & Dave) > > v6: > > - fix info->end check in flush_tlb_kernel_range (Michael) > > - disable broadcast TLB flushing on 32 bit x86 > > v5: > > - use byte assembly for compatibility with older toolchains (Borislav,= Michael) > > - ensure a panic on an invalid number of extra pages (Dave, Tom) > > - add cant_migrate() assertion to tlbsync (Jann) > > - a bunch more cleanups (Nadav) > > - key TCE enabling off X86_FEATURE_TCE (Andrew) > > - fix a race between reclaim and ASID transition (Jann) > > v4: > > - Use only bitmaps to track free global ASIDs (Nadav) > > - Improved AMD initialization (Borislav & Tom) > > - Various naming and documentation improvements (Peter, Nadav, Tom, Da= ve) > > - Fixes for subtle race conditions (Jann) > > v3: > > - Remove paravirt tlb_remove_table call (thank you Qi Zheng) > > - More suggested cleanups and changelog fixes by Peter and Nadav > > v2: > > - Apply suggestions by Peter and Borislav (thank you!) > > - Fix bug in arch_tlbbatch_flush, where we need to do both > > the TLBSYNC, and flush the CPUs that are in the cpumask. > > - Some updates to comments and changelogs based on questions. > >=20 > >=20 > >=20 >=20 >=20 >=20 =2D-=20 Oleksandr Natalenko, MSE --nextPart2356740.ElGaqSPkdT Content-Type: application/pgp-signature; name="signature.asc" Content-Description: This is a digitally signed message part. Content-Transfer-Encoding: 7Bit -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEZUOOw5ESFLHZZtOKil/iNcg8M0sFAme5tsAACgkQil/iNcg8 M0vckg//QzzLbzZgVuw+RAazI0uv0qTdmpxqdl6uOTZKJVyV3ePBCMEaD3nQyJMu MrL3NFnj6piuNlsMh0Melx0DACW7JoJtQNzXATOil6HbQ7e3iMPLr4nUVCdjqEKP wy3/HQpaXs/2NSdUZBEv/IqPeazqCvIBOZ3ENNg1cq9Lni8kojYJL2OVo/2XwgoP dYZEIzTvzRaj4Gje7NmSa1T93K8yLKTZS5jMV97NSCmKxiq2VxXEkqwIDvhoNmU0 C4Gi8X81QRHhUyRQOiJWdLFre9nDMfkzhryj87LJ8XFcxxwhsFVVygBchkmnQzFr 06PdyjoBVlUeZTEPgnuu8mRH/GJO6PCX/FyZD6UgofsyIsVaQsSxrjgmEm1lDUsQ 3Elqxbpp/005gICK6mdglVPq2b5TOYKx0ySLAh/TQgYUShjBysm5ajrv7zBPRIvH rCU3fa+qpbkRkP2xOCs+r6/XOvEhVDZ1nAzGQR9TNeLjyen5Eyyrg5wJojbR4jbt 5+8BeJTj68PPCHt6BR5TTxiOhhxD4qRPw6Tl5vVFdeJo6wRnvKbR5hoy9tpjKudS PMhdp/x1DlUEgCQI8WbNeh7KQHAI21zQq10QfsULWCkCm31BgvgWvMQ3cQs3R6jB r8uBzLDnyupYcGgGD+qg08vX69xJzEGcyxRyGwtQWuapquDM7EQ= =dmfm -----END PGP SIGNATURE----- --nextPart2356740.ElGaqSPkdT--