From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3FEC7C433EF for ; Fri, 8 Jul 2022 05:56:32 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 9C8C16B0071; Fri, 8 Jul 2022 01:56:31 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 977506B0073; Fri, 8 Jul 2022 01:56:31 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 83EE3900002; Fri, 8 Jul 2022 01:56:31 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 752DD6B0071 for ; Fri, 8 Jul 2022 01:56:31 -0400 (EDT) Received: from smtpin13.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay13.hostedemail.com (Postfix) with ESMTP id 4897960EDA for ; Fri, 8 Jul 2022 05:56:31 +0000 (UTC) X-FDA: 79662872982.13.8021355 Received: from mail-pj1-f44.google.com (mail-pj1-f44.google.com [209.85.216.44]) by imf10.hostedemail.com (Postfix) with ESMTP id DFDA3C0004 for ; Fri, 8 Jul 2022 05:56:30 +0000 (UTC) Received: by mail-pj1-f44.google.com with SMTP id o31-20020a17090a0a2200b001ef7bd037bbso875169pjo.0 for ; Thu, 07 Jul 2022 22:56:30 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=lGpq+k4aL6JBJ5U6CQh070/+9vynrq/t8pKneZ7U0o0=; b=UHOB/KpIS/lB7rEV5m5ae2u7g9rWhMnyrcbAelwongYAv3aim2c1JaUsnXYF+sUqCs 2o0b/dBs/X8vpQ2N8GmdznXLy+2yU5K6QTx/3Fr7UwUkfGjiugPxEXGAAdfrWjiOHERW io0tOq9BM884HEiStsfg2++hupHngOSJyJDfaM07QU1mAuikiV1E8vPsdnkPCZHqYbMp ifrjXoMwzkgs29qWp67pAdkerLT3B85isME1iZ/zXSMDvUtnLGrSHU9fMn5xfSxImxGy ZzBHQoIpjH0+EHAX6e+pNixgMC5h52WKEtteQwWho/VKi0vRxpwAMHXYZICO38ySTzPP 5Glg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=lGpq+k4aL6JBJ5U6CQh070/+9vynrq/t8pKneZ7U0o0=; b=6ebzxP/2plKn2/dlNVyIW+HOmhUqWnPQHQdmNs0bHEn2R2s9wxeiwo+e9tKrIy84Jq WUsYZq/WCsMUIr+5KlOg6XV2O0G0C9HhZ41C24NdJ2SaKy8GLqIyvBSX9fmKf3T5JR/m ZqcaVbTHF3/kYEl24/FDhpQadsDSDQyIkiMk5Jz6XcBBQQKiXIUdE3a/GjrGpOAC5w7j t82PWkqwdB74cBldmma2t3K/O0iDym62Ncj7uRIUZPVh08DQUS5JjxQtxjIEkdvcsJ4U bJ5Jk+4koQtoLq+i58a55hoF+YAtYUrdNfExOk0XL7dpvWst5WaSuUId0Iz5N8+HWYSM xm3A== X-Gm-Message-State: AJIora+nhgGwW3Wz45uYbpSqfxWBLXuOBtIwHr49Vvk/o+P7l6S26X3b ARbyTzRK4kPfKrJ6AAFKQHw= X-Google-Smtp-Source: AGRyM1vRn/S+J9DQVangdlZw3nyPoWjeEi7vw4gx6o6rv4QMXgDhWl75i2N7E50LVHiCGk5h4SkGuw== X-Received: by 2002:a17:902:c405:b0:16c:3cd:db84 with SMTP id k5-20020a170902c40500b0016c03cddb84mr1971171plk.6.1657259789568; Thu, 07 Jul 2022 22:56:29 -0700 (PDT) Received: from smtpclient.apple (c-24-6-216-183.hsd1.ca.comcast.net. [24.6.216.183]) by smtp.gmail.com with ESMTPSA id 15-20020a63164f000000b00413d592af6asm2249095pgw.50.2022.07.07.22.56.28 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Thu, 07 Jul 2022 22:56:29 -0700 (PDT) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3696.100.31\)) Subject: Re: [PATCH v2] x86/mm/tlb: avoid reading mm_tlb_gen when possible From: Nadav Amit In-Reply-To: <904C4BCE-78E7-4FEE-BD8D-03DCE75A5B8B@gmail.com> Date: Thu, 7 Jul 2022 22:56:25 -0700 Cc: Andrew Morton , Dave Hansen , LKML , Peter Zijlstra , Ingo Molnar , Andy Lutomirski , Thomas Gleixner , x86@kernel.org, linux-mm@kvack.org Content-Transfer-Encoding: quoted-printable Message-Id: <575B908D-A29B-40B0-9A80-76B7E7A9762E@gmail.com> References: <20220606180123.2485171-1-namit@vmware.com> <904C4BCE-78E7-4FEE-BD8D-03DCE75A5B8B@gmail.com> To: Hugh Dickins X-Mailer: Apple Mail (2.3696.100.31) ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1657259791; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=lGpq+k4aL6JBJ5U6CQh070/+9vynrq/t8pKneZ7U0o0=; b=UvVC/b+khF/aPcqFo+NYWpc0SqgybxTHuOuIw0riBE+60GUylYFHD8CxoNQaVH4oH10azZ uoeQl1YvcFjA+nCfh/S/UDEFWqRCpT50lELnou8AaxTmVGCdb92RGeE2c0zcK9QBmRxMNK R4MS4EwBeUVleBe/bXAIhR9B0kUImtc= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1657259791; a=rsa-sha256; cv=none; b=y2TzyJKZhLtcuij8VqIKWlxOsDrLyXj+hz3ITpKe12S+mASmLuCYGrOM7Fv/Z+hHL6JYf6 cummyglPW8I4+u9eEPP8wK1uYomlVefrxNGtF/vrSY8o0YSW5h4jhz4mo2ZifhLTbc7xdU bue01Nykv47+GZsOmD59BJE3ZRYzlDU= ARC-Authentication-Results: i=1; imf10.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b="UHOB/KpI"; spf=pass (imf10.hostedemail.com: domain of nadav.amit@gmail.com designates 209.85.216.44 as permitted sender) smtp.mailfrom=nadav.amit@gmail.com; dmarc=pass (policy=none) header.from=gmail.com X-Rspamd-Queue-Id: DFDA3C0004 X-Rspam-User: Authentication-Results: imf10.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b="UHOB/KpI"; spf=pass (imf10.hostedemail.com: domain of nadav.amit@gmail.com designates 209.85.216.44 as permitted sender) smtp.mailfrom=nadav.amit@gmail.com; dmarc=pass (policy=none) header.from=gmail.com X-Rspamd-Server: rspam03 X-Stat-Signature: ebwnef9yd1cyosmkzdmyybtpxwsze9nb X-HE-Tag: 1657259790-276470 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Jul 7, 2022, at 9:23 PM, Nadav Amit wrote: > On Jul 7, 2022, at 8:27 PM, Hugh Dickins wrote: >=20 >> On Mon, 6 Jun 2022, Nadav Amit wrote: >>=20 >>> From: Nadav Amit >>>=20 >>> On extreme TLB shootdown storms, the mm's tlb_gen cacheline is = highly >>> contended and reading it should (arguably) be avoided as much as >>> possible. >>>=20 >>> Currently, flush_tlb_func() reads the mm's tlb_gen unconditionally, >>> even when it is not necessary (e.g., the mm was already switched). >>> This is wasteful. >>>=20 >>> Moreover, one of the existing optimizations is to read mm's tlb_gen = to >>> see if there are additional in-flight TLB invalidations and flush = the >>> entire TLB in such a case. However, if the request's tlb_gen was = already >>> flushed, the benefit of checking the mm's tlb_gen is likely to be = offset >>> by the overhead of the check itself. >>>=20 >>> Running will-it-scale with tlb_flush1_threads show a considerable >>> benefit on 56-core Skylake (up to +24%): >>>=20 >>> threads Baseline (v5.17+) +Patch >>> 1 159960 160202 >>> 5 310808 308378 (-0.7%) >>> 10 479110 490728 >>> 15 526771 562528 >>> 20 534495 587316 >>> 25 547462 628296 >>> 30 579616 666313 >>> 35 594134 701814 >>> 40 612288 732967 >>> 45 617517 749727 >>> 50 637476 735497 >>> 55 614363 778913 (+24%) >>>=20 >>> Acked-by: Peter Zijlstra (Intel) >>> Cc: Dave Hansen >>> Cc: Ingo Molnar >>> Cc: Andy Lutomirski >>> Cc: Thomas Gleixner >>> Cc: x86@kernel.org >>> Signed-off-by: Nadav Amit >>>=20 >>> -- >>>=20 >>> Note: The benchmarked kernels include Dave's revert of commit >>> 6035152d8eeb ("x86/mm/tlb: Open-code on_each_cpu_cond_mask() for >>> tlb_is_not_lazy() >>> --- >>> arch/x86/mm/tlb.c | 18 +++++++++++++++++- >>> 1 file changed, 17 insertions(+), 1 deletion(-) >>>=20 >>> diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c >>> index d400b6d9d246..d9314cc8b81f 100644 >>> --- a/arch/x86/mm/tlb.c >>> +++ b/arch/x86/mm/tlb.c >>> @@ -734,10 +734,10 @@ static void flush_tlb_func(void *info) >>> const struct flush_tlb_info *f =3D info; >>> struct mm_struct *loaded_mm =3D = this_cpu_read(cpu_tlbstate.loaded_mm); >>> u32 loaded_mm_asid =3D = this_cpu_read(cpu_tlbstate.loaded_mm_asid); >>> - u64 mm_tlb_gen =3D atomic64_read(&loaded_mm->context.tlb_gen); >>> u64 local_tlb_gen =3D = this_cpu_read(cpu_tlbstate.ctxs[loaded_mm_asid].tlb_gen); >>> bool local =3D smp_processor_id() =3D=3D f->initiating_cpu; >>> unsigned long nr_invalidate =3D 0; >>> + u64 mm_tlb_gen; >>>=20 >>> /* This code cannot presently handle being reentered. */ >>> VM_WARN_ON(!irqs_disabled()); >>> @@ -771,6 +771,22 @@ static void flush_tlb_func(void *info) >>> return; >>> } >>>=20 >>> + if (f->new_tlb_gen <=3D local_tlb_gen) { >>> + /* >>> + * The TLB is already up to date in respect to = f->new_tlb_gen. >>> + * While the core might be still behind mm_tlb_gen, = checking >>> + * mm_tlb_gen unnecessarily would have negative caching = effects >>> + * so avoid it. >>> + */ >>> + return; >>> + } >>> + >>> + /* >>> + * Defer mm_tlb_gen reading as long as possible to avoid cache >>> + * contention. >>> + */ >>> + mm_tlb_gen =3D atomic64_read(&loaded_mm->context.tlb_gen); >>> + >>> if (unlikely(local_tlb_gen =3D=3D mm_tlb_gen)) { >>> /* >>> * There's nothing to do: we're already up to date. = This can >>> --=20 >>> 2.25.1 >>=20 >> I'm sorry, but bisection and reversion show that this commit, >> aa44284960d550eb4d8614afdffebc68a432a9b4 in current linux-next, >> is responsible for the "internal compiler error: Segmentation fault"s >> I get when running kernel builds on tmpfs in 1G memory, lots of = swapping. >>=20 >> That tmpfs is using huge pages as much as it can, so splitting and >> collapsing, compaction and page migration entailed, in case that's >> relevant (maybe this commit is perfect, but there's a TLB flushing >> bug over there in mm which this commit just exposes). >>=20 >> Whether those segfaults happen without the huge page element, >> I have not done enough testing to tell - there are other bugs with >> swapping in current linux-next, indeed, I wouldn't even have found >> this one, if I hadn't already been on a bisection for another bug, >> and got thrown off course by these segfaults. >>=20 >> I hope that you can work out what might be wrong with this, >> but meantime I think it needs to be reverted. >=20 > I find it always surprising how trivial one liners fail. >=20 > As you probably know, debugging these kind of things is hard. I see = two > possible cases: >=20 > 1. The failure is directly related to this optimization. The immediate > suspect in my mind is something to do with PCID/ASID. >=20 > 2. The failure is due to another bug that was papered by =E2=80=9Cenough= =E2=80=9D TLB > flushes. >=20 > I will look into the code. But if it is possible, it would be helpful = to > know whether you get the failure with the =E2=80=9Cnopcid=E2=80=9D = kernel parameter. If it > passes, it wouldn=E2=80=99t say much, but if it fails, I think (2) is = more likely. >=20 > Not arguing about a revert, but, in some way, if the test fails, it = can > indicate that the optimization =E2=80=9Cworks=E2=80=9D=E2=80=A6 >=20 > I=E2=80=99ll put some time to look deeper into the code, but it would = be very > helpful if you can let me know what happens with nopcid. Actually, only using =E2=80=9Cnopcid=E2=80=9D would most likely make it = go away if we have PTI enabled. So to get a good indication, a check whether it reproduces = with =E2=80=9Cnopti=E2=80=9D and =E2=80=9Cnopcid=E2=80=9D is needed. I don=E2=80=99t have a better answer yet. Still trying to see what might = have gone wrong.=