From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 82479C43334 for ; Fri, 8 Jul 2022 04:23:49 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B9F3F900002; Fri, 8 Jul 2022 00:23:48 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id B4EF26B0073; Fri, 8 Jul 2022 00:23:48 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A1667900002; Fri, 8 Jul 2022 00:23:48 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 926B26B0071 for ; Fri, 8 Jul 2022 00:23:48 -0400 (EDT) Received: from smtpin27.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 58FE920DCC for ; Fri, 8 Jul 2022 04:23:48 +0000 (UTC) X-FDA: 79662639336.27.894B710 Received: from mail-pl1-f182.google.com (mail-pl1-f182.google.com [209.85.214.182]) by imf05.hostedemail.com (Postfix) with ESMTP id EAFD610001B for ; Fri, 8 Jul 2022 04:23:47 +0000 (UTC) Received: by mail-pl1-f182.google.com with SMTP id p16so661276plo.0 for ; Thu, 07 Jul 2022 21:23:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=cXx67UAEHDEms/ZWdhD86sRM6wFjqHtEAMk+ffjRrxE=; b=JFSekI06NM32A65VsCUGKY5q7uKVQrOkBNr0f/7Me7qwFMfpuKcMw4eV0gHCTpzq8y q9WzUxmA91ezdxkWfSvUgECuEq8neZW3wtKRx9wzJ0/5ZDdrfGejiqVNsdzP/33wUevE XQVHAbLCDj8f8fEKk79Q6Sv5l+duvSg4uZnqsh+gpv9fJvQwv+CIyDA2rZULrV9rzX6L WXWBVZLo/CdxLUNpnJAnj3/otAN51nHHm5/mPaeyt7OP/x2yWrLSp+xWyEfHJ9kzYBem ScX4Xueawd2Gtl8EKfbIWyuNUsOi2YNjQUk0SfDSNYhJn4/NlzKRBASrB2yyRdjr4aQ9 6m9A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=cXx67UAEHDEms/ZWdhD86sRM6wFjqHtEAMk+ffjRrxE=; b=xL1vTctZtaTXvta6uH3RB0YmNtxnj2FLsc7OcbWLjgt2jSAksyOi7NMqkVLzloYWnk +fO5zSujH/6KyWy3tJL1Cjdq6MxgwupMrWWGKFq5h7kRH2UbTt35PrtFdqCD89Vo5eHj Gu6XU/a69qGkNQv34XMbstG203i+HMoNkNZUUA4vSttkbAJ5P2YeF6RyQdDLxsLYK49L lkGd4bmkPXp9fBJseccE3rtdm6Wf6PeJScQwbRBkIuP5vHHSxigXVaYM9stL6WF4nQv+ VwXRzmtWB8XKX5wbSK0WIqp88PFOWM6KUSLDBo/MaAIm/D+BGhk+KNcJcsx5mPw4HBAS hmHA== X-Gm-Message-State: AJIora//90cW5d8xDeyesornLD5fX2r3Vtba2I40dL3F+M+t4FE5d1bi mZq/fRnuPHcnYWfwyhQOP5o= X-Google-Smtp-Source: AGRyM1uBTt9rPkZ3Sgf5UWN0qyjSLmOVDBH6bD8Gpgk2mUCSmEImTx7E15TRBWxg5odBsQbtkdoYPw== X-Received: by 2002:a17:902:7881:b0:16b:c4a6:1dc9 with SMTP id q1-20020a170902788100b0016bc4a61dc9mr1652435pll.83.1657254226580; Thu, 07 Jul 2022 21:23:46 -0700 (PDT) Received: from smtpclient.apple (c-24-6-216-183.hsd1.ca.comcast.net. [24.6.216.183]) by smtp.gmail.com with ESMTPSA id 123-20020a621481000000b00527bb6fff6csm20668884pfu.119.2022.07.07.21.23.45 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Thu, 07 Jul 2022 21:23:46 -0700 (PDT) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3696.100.31\)) Subject: Re: [PATCH v2] x86/mm/tlb: avoid reading mm_tlb_gen when possible From: Nadav Amit In-Reply-To: Date: Thu, 7 Jul 2022 21:23:44 -0700 Cc: Andrew Morton , Dave Hansen , LKML , Peter Zijlstra , Ingo Molnar , Andy Lutomirski , Thomas Gleixner , x86@kernel.org, linux-mm@kvack.org Content-Transfer-Encoding: quoted-printable Message-Id: <904C4BCE-78E7-4FEE-BD8D-03DCE75A5B8B@gmail.com> References: <20220606180123.2485171-1-namit@vmware.com> To: Hugh Dickins X-Mailer: Apple Mail (2.3696.100.31) ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1657254228; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=cXx67UAEHDEms/ZWdhD86sRM6wFjqHtEAMk+ffjRrxE=; b=fk8ZK6uG69q5/tyyYaqSFGqhCXPBSq2dC652meqibEp8YLSFJcRV40lMS2RpK2Ns2JRWT9 6Idst1hpfK3vO8vgi99h74TV042f0sbveBJLzFg6WwbDox+6ns6fsPhv0m3BSzZXqg8d9R Si7hxS1Ie9h0R5EmqdRizDtVasEy1eA= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1657254228; a=rsa-sha256; cv=none; b=gcjlwgBW/5R6awQ7YtCZA02PXDASpUxHahiNOclK5IISuvmgevoMitzhQreMmTeXhM6g/k Ck3TAxL/7WX6gSUMx3U5ItZ0IDUmLYcmzriR5ItbKxiJXUCYYOy9VuYlJs3Jy/ifOz069S gJwKv4xWbrZ1FJa5VJ4/WUaJ7ivsZY0= ARC-Authentication-Results: i=1; imf05.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=JFSekI06; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf05.hostedemail.com: domain of nadav.amit@gmail.com designates 209.85.214.182 as permitted sender) smtp.mailfrom=nadav.amit@gmail.com Authentication-Results: imf05.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=JFSekI06; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf05.hostedemail.com: domain of nadav.amit@gmail.com designates 209.85.214.182 as permitted sender) smtp.mailfrom=nadav.amit@gmail.com X-Stat-Signature: wtzew69wj674xf7g83w84yjj5r1qtxin X-Rspamd-Queue-Id: EAFD610001B X-Rspamd-Server: rspam07 X-Rspam-User: X-HE-Tag: 1657254227-256394 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Jul 7, 2022, at 8:27 PM, Hugh Dickins wrote: > On Mon, 6 Jun 2022, Nadav Amit wrote: >=20 >> From: Nadav Amit >>=20 >> On extreme TLB shootdown storms, the mm's tlb_gen cacheline is highly >> contended and reading it should (arguably) be avoided as much as >> possible. >>=20 >> Currently, flush_tlb_func() reads the mm's tlb_gen unconditionally, >> even when it is not necessary (e.g., the mm was already switched). >> This is wasteful. >>=20 >> Moreover, one of the existing optimizations is to read mm's tlb_gen = to >> see if there are additional in-flight TLB invalidations and flush the >> entire TLB in such a case. However, if the request's tlb_gen was = already >> flushed, the benefit of checking the mm's tlb_gen is likely to be = offset >> by the overhead of the check itself. >>=20 >> Running will-it-scale with tlb_flush1_threads show a considerable >> benefit on 56-core Skylake (up to +24%): >>=20 >> threads Baseline (v5.17+) +Patch >> 1 159960 160202 >> 5 310808 308378 (-0.7%) >> 10 479110 490728 >> 15 526771 562528 >> 20 534495 587316 >> 25 547462 628296 >> 30 579616 666313 >> 35 594134 701814 >> 40 612288 732967 >> 45 617517 749727 >> 50 637476 735497 >> 55 614363 778913 (+24%) >>=20 >> Acked-by: Peter Zijlstra (Intel) >> Cc: Dave Hansen >> Cc: Ingo Molnar >> Cc: Andy Lutomirski >> Cc: Thomas Gleixner >> Cc: x86@kernel.org >> Signed-off-by: Nadav Amit >>=20 >> -- >>=20 >> Note: The benchmarked kernels include Dave's revert of commit >> 6035152d8eeb ("x86/mm/tlb: Open-code on_each_cpu_cond_mask() for >> tlb_is_not_lazy() >> --- >> arch/x86/mm/tlb.c | 18 +++++++++++++++++- >> 1 file changed, 17 insertions(+), 1 deletion(-) >>=20 >> diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c >> index d400b6d9d246..d9314cc8b81f 100644 >> --- a/arch/x86/mm/tlb.c >> +++ b/arch/x86/mm/tlb.c >> @@ -734,10 +734,10 @@ static void flush_tlb_func(void *info) >> const struct flush_tlb_info *f =3D info; >> struct mm_struct *loaded_mm =3D = this_cpu_read(cpu_tlbstate.loaded_mm); >> u32 loaded_mm_asid =3D = this_cpu_read(cpu_tlbstate.loaded_mm_asid); >> - u64 mm_tlb_gen =3D atomic64_read(&loaded_mm->context.tlb_gen); >> u64 local_tlb_gen =3D = this_cpu_read(cpu_tlbstate.ctxs[loaded_mm_asid].tlb_gen); >> bool local =3D smp_processor_id() =3D=3D f->initiating_cpu; >> unsigned long nr_invalidate =3D 0; >> + u64 mm_tlb_gen; >>=20 >> /* This code cannot presently handle being reentered. */ >> VM_WARN_ON(!irqs_disabled()); >> @@ -771,6 +771,22 @@ static void flush_tlb_func(void *info) >> return; >> } >>=20 >> + if (f->new_tlb_gen <=3D local_tlb_gen) { >> + /* >> + * The TLB is already up to date in respect to = f->new_tlb_gen. >> + * While the core might be still behind mm_tlb_gen, = checking >> + * mm_tlb_gen unnecessarily would have negative caching = effects >> + * so avoid it. >> + */ >> + return; >> + } >> + >> + /* >> + * Defer mm_tlb_gen reading as long as possible to avoid cache >> + * contention. >> + */ >> + mm_tlb_gen =3D atomic64_read(&loaded_mm->context.tlb_gen); >> + >> if (unlikely(local_tlb_gen =3D=3D mm_tlb_gen)) { >> /* >> * There's nothing to do: we're already up to date. = This can >> --=20 >> 2.25.1 >=20 > I'm sorry, but bisection and reversion show that this commit, > aa44284960d550eb4d8614afdffebc68a432a9b4 in current linux-next, > is responsible for the "internal compiler error: Segmentation fault"s > I get when running kernel builds on tmpfs in 1G memory, lots of = swapping. >=20 > That tmpfs is using huge pages as much as it can, so splitting and > collapsing, compaction and page migration entailed, in case that's > relevant (maybe this commit is perfect, but there's a TLB flushing > bug over there in mm which this commit just exposes). >=20 > Whether those segfaults happen without the huge page element, > I have not done enough testing to tell - there are other bugs with > swapping in current linux-next, indeed, I wouldn't even have found > this one, if I hadn't already been on a bisection for another bug, > and got thrown off course by these segfaults. >=20 > I hope that you can work out what might be wrong with this, > but meantime I think it needs to be reverted. I find it always surprising how trivial one liners fail. As you probably know, debugging these kind of things is hard. I see two possible cases: 1. The failure is directly related to this optimization. The immediate suspect in my mind is something to do with PCID/ASID. 2. The failure is due to another bug that was papered by =E2=80=9Cenough=E2= =80=9D TLB flushes. I will look into the code. But if it is possible, it would be helpful to know whether you get the failure with the =E2=80=9Cnopcid=E2=80=9D = kernel parameter. If it passes, it wouldn=E2=80=99t say much, but if it fails, I think (2) is = more likely. Not arguing about a revert, but, in some way, if the test fails, it can indicate that the optimization =E2=80=9Cworks=E2=80=9D=E2=80=A6 I=E2=80=99ll put some time to look deeper into the code, but it would be = very helpful if you can let me know what happens with nopcid.