From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <owner-linux-mm@kvack.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 3FEC7C433EF
	for <linux-mm@archiver.kernel.org>; Fri,  8 Jul 2022 05:56:32 +0000 (UTC)
Received: by kanga.kvack.org (Postfix)
	id 9C8C16B0071; Fri,  8 Jul 2022 01:56:31 -0400 (EDT)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id 977506B0073; Fri,  8 Jul 2022 01:56:31 -0400 (EDT)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id 83EE3900002; Fri,  8 Jul 2022 01:56:31 -0400 (EDT)
X-Delivered-To: linux-mm@kvack.org
Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13])
	by kanga.kvack.org (Postfix) with ESMTP id 752DD6B0071
	for <linux-mm@kvack.org>; Fri,  8 Jul 2022 01:56:31 -0400 (EDT)
Received: from smtpin13.hostedemail.com (a10.router.float.18 [10.200.18.1])
	by unirelay13.hostedemail.com (Postfix) with ESMTP id 4897960EDA
	for <linux-mm@kvack.org>; Fri,  8 Jul 2022 05:56:31 +0000 (UTC)
X-FDA: 79662872982.13.8021355
Received: from mail-pj1-f44.google.com (mail-pj1-f44.google.com [209.85.216.44])
	by imf10.hostedemail.com (Postfix) with ESMTP id DFDA3C0004
	for <linux-mm@kvack.org>; Fri,  8 Jul 2022 05:56:30 +0000 (UTC)
Received: by mail-pj1-f44.google.com with SMTP id o31-20020a17090a0a2200b001ef7bd037bbso875169pjo.0
        for <linux-mm@kvack.org>; Thu, 07 Jul 2022 22:56:30 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20210112;
        h=mime-version:subject:from:in-reply-to:date:cc
         :content-transfer-encoding:message-id:references:to;
        bh=lGpq+k4aL6JBJ5U6CQh070/+9vynrq/t8pKneZ7U0o0=;
        b=UHOB/KpIS/lB7rEV5m5ae2u7g9rWhMnyrcbAelwongYAv3aim2c1JaUsnXYF+sUqCs
         2o0b/dBs/X8vpQ2N8GmdznXLy+2yU5K6QTx/3Fr7UwUkfGjiugPxEXGAAdfrWjiOHERW
         io0tOq9BM884HEiStsfg2++hupHngOSJyJDfaM07QU1mAuikiV1E8vPsdnkPCZHqYbMp
         ifrjXoMwzkgs29qWp67pAdkerLT3B85isME1iZ/zXSMDvUtnLGrSHU9fMn5xfSxImxGy
         ZzBHQoIpjH0+EHAX6e+pNixgMC5h52WKEtteQwWho/VKi0vRxpwAMHXYZICO38ySTzPP
         5Glg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=x-gm-message-state:mime-version:subject:from:in-reply-to:date:cc
         :content-transfer-encoding:message-id:references:to;
        bh=lGpq+k4aL6JBJ5U6CQh070/+9vynrq/t8pKneZ7U0o0=;
        b=6ebzxP/2plKn2/dlNVyIW+HOmhUqWnPQHQdmNs0bHEn2R2s9wxeiwo+e9tKrIy84Jq
         WUsYZq/WCsMUIr+5KlOg6XV2O0G0C9HhZ41C24NdJ2SaKy8GLqIyvBSX9fmKf3T5JR/m
         ZqcaVbTHF3/kYEl24/FDhpQadsDSDQyIkiMk5Jz6XcBBQQKiXIUdE3a/GjrGpOAC5w7j
         t82PWkqwdB74cBldmma2t3K/O0iDym62Ncj7uRIUZPVh08DQUS5JjxQtxjIEkdvcsJ4U
         bJ5Jk+4koQtoLq+i58a55hoF+YAtYUrdNfExOk0XL7dpvWst5WaSuUId0Iz5N8+HWYSM
         xm3A==
X-Gm-Message-State: AJIora+nhgGwW3Wz45uYbpSqfxWBLXuOBtIwHr49Vvk/o+P7l6S26X3b
	ARbyTzRK4kPfKrJ6AAFKQHw=
X-Google-Smtp-Source: AGRyM1vRn/S+J9DQVangdlZw3nyPoWjeEi7vw4gx6o6rv4QMXgDhWl75i2N7E50LVHiCGk5h4SkGuw==
X-Received: by 2002:a17:902:c405:b0:16c:3cd:db84 with SMTP id k5-20020a170902c40500b0016c03cddb84mr1971171plk.6.1657259789568;
        Thu, 07 Jul 2022 22:56:29 -0700 (PDT)
Received: from smtpclient.apple (c-24-6-216-183.hsd1.ca.comcast.net. [24.6.216.183])
        by smtp.gmail.com with ESMTPSA id 15-20020a63164f000000b00413d592af6asm2249095pgw.50.2022.07.07.22.56.28
        (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128);
        Thu, 07 Jul 2022 22:56:29 -0700 (PDT)
Content-Type: text/plain;
	charset=utf-8
Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3696.100.31\))
Subject: Re: [PATCH v2] x86/mm/tlb: avoid reading mm_tlb_gen when possible
From: Nadav Amit <nadav.amit@gmail.com>
In-Reply-To: <904C4BCE-78E7-4FEE-BD8D-03DCE75A5B8B@gmail.com>
Date: Thu, 7 Jul 2022 22:56:25 -0700
Cc: Andrew Morton <akpm@linux-foundation.org>,
 Dave Hansen <dave.hansen@linux.intel.com>,
 LKML <linux-kernel@vger.kernel.org>,
 Peter Zijlstra <peterz@infradead.org>,
 Ingo Molnar <mingo@kernel.org>,
 Andy Lutomirski <luto@kernel.org>,
 Thomas Gleixner <tglx@linutronix.de>,
 x86@kernel.org,
 linux-mm@kvack.org
Content-Transfer-Encoding: quoted-printable
Message-Id: <575B908D-A29B-40B0-9A80-76B7E7A9762E@gmail.com>
References: <20220606180123.2485171-1-namit@vmware.com>
 <df9e416a-a6a2-34a-9fa9-dcb92fe6cee2@google.com>
 <904C4BCE-78E7-4FEE-BD8D-03DCE75A5B8B@gmail.com>
To: Hugh Dickins <hughd@google.com>
X-Mailer: Apple Mail (2.3696.100.31)
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com;
	s=arc-20220608; t=1657259791;
	h=from:from:sender:reply-to:subject:subject:date:date:
	 message-id:message-id:to:to:cc:cc:mime-version:mime-version:
	 content-type:content-type:
	 content-transfer-encoding:content-transfer-encoding:
	 in-reply-to:in-reply-to:references:references:dkim-signature;
	bh=lGpq+k4aL6JBJ5U6CQh070/+9vynrq/t8pKneZ7U0o0=;
	b=UvVC/b+khF/aPcqFo+NYWpc0SqgybxTHuOuIw0riBE+60GUylYFHD8CxoNQaVH4oH10azZ
	uoeQl1YvcFjA+nCfh/S/UDEFWqRCpT50lELnou8AaxTmVGCdb92RGeE2c0zcK9QBmRxMNK
	R4MS4EwBeUVleBe/bXAIhR9B0kUImtc=
ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1657259791; a=rsa-sha256;
	cv=none;
	b=y2TzyJKZhLtcuij8VqIKWlxOsDrLyXj+hz3ITpKe12S+mASmLuCYGrOM7Fv/Z+hHL6JYf6
	cummyglPW8I4+u9eEPP8wK1uYomlVefrxNGtF/vrSY8o0YSW5h4jhz4mo2ZifhLTbc7xdU
	bue01Nykv47+GZsOmD59BJE3ZRYzlDU=
ARC-Authentication-Results: i=1;
	imf10.hostedemail.com;
	dkim=pass header.d=gmail.com header.s=20210112 header.b="UHOB/KpI";
	spf=pass (imf10.hostedemail.com: domain of nadav.amit@gmail.com designates 209.85.216.44 as permitted sender) smtp.mailfrom=nadav.amit@gmail.com;
	dmarc=pass (policy=none) header.from=gmail.com
X-Rspamd-Queue-Id: DFDA3C0004
X-Rspam-User: 
Authentication-Results: imf10.hostedemail.com;
	dkim=pass header.d=gmail.com header.s=20210112 header.b="UHOB/KpI";
	spf=pass (imf10.hostedemail.com: domain of nadav.amit@gmail.com designates 209.85.216.44 as permitted sender) smtp.mailfrom=nadav.amit@gmail.com;
	dmarc=pass (policy=none) header.from=gmail.com
X-Rspamd-Server: rspam03
X-Stat-Signature: ebwnef9yd1cyosmkzdmyybtpxwsze9nb
X-HE-Tag: 1657259790-276470
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>

On Jul 7, 2022, at 9:23 PM, Nadav Amit <nadav.amit@gmail.com> wrote:

> On Jul 7, 2022, at 8:27 PM, Hugh Dickins <hughd@google.com> wrote:
>=20
>> On Mon, 6 Jun 2022, Nadav Amit wrote:
>>=20
>>> From: Nadav Amit <namit@vmware.com>
>>>=20
>>> On extreme TLB shootdown storms, the mm's tlb_gen cacheline is =
highly
>>> contended and reading it should (arguably) be avoided as much as
>>> possible.
>>>=20
>>> Currently, flush_tlb_func() reads the mm's tlb_gen unconditionally,
>>> even when it is not necessary (e.g., the mm was already switched).
>>> This is wasteful.
>>>=20
>>> Moreover, one of the existing optimizations is to read mm's tlb_gen =
to
>>> see if there are additional in-flight TLB invalidations and flush =
the
>>> entire TLB in such a case. However, if the request's tlb_gen was =
already
>>> flushed, the benefit of checking the mm's tlb_gen is likely to be =
offset
>>> by the overhead of the check itself.
>>>=20
>>> Running will-it-scale with tlb_flush1_threads show a considerable
>>> benefit on 56-core Skylake (up to +24%):
>>>=20
>>> threads		Baseline (v5.17+)	+Patch
>>> 1		159960			160202
>>> 5		310808			308378 (-0.7%)
>>> 10		479110			490728
>>> 15		526771			562528
>>> 20		534495			587316
>>> 25		547462			628296
>>> 30		579616			666313
>>> 35		594134			701814
>>> 40		612288			732967
>>> 45		617517			749727
>>> 50		637476			735497
>>> 55		614363			778913 (+24%)
>>>=20
>>> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
>>> Cc: Dave Hansen <dave.hansen@linux.intel.com>
>>> Cc: Ingo Molnar <mingo@kernel.org>
>>> Cc: Andy Lutomirski <luto@kernel.org>
>>> Cc: Thomas Gleixner <tglx@linutronix.de>
>>> Cc: x86@kernel.org
>>> Signed-off-by: Nadav Amit <namit@vmware.com>
>>>=20
>>> --
>>>=20
>>> Note: The benchmarked kernels include Dave's revert of commit
>>> 6035152d8eeb ("x86/mm/tlb: Open-code on_each_cpu_cond_mask() for
>>> tlb_is_not_lazy()
>>> ---
>>> arch/x86/mm/tlb.c | 18 +++++++++++++++++-
>>> 1 file changed, 17 insertions(+), 1 deletion(-)
>>>=20
>>> diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
>>> index d400b6d9d246..d9314cc8b81f 100644
>>> --- a/arch/x86/mm/tlb.c
>>> +++ b/arch/x86/mm/tlb.c
>>> @@ -734,10 +734,10 @@ static void flush_tlb_func(void *info)
>>> 	const struct flush_tlb_info *f =3D info;
>>> 	struct mm_struct *loaded_mm =3D =
this_cpu_read(cpu_tlbstate.loaded_mm);
>>> 	u32 loaded_mm_asid =3D =
this_cpu_read(cpu_tlbstate.loaded_mm_asid);
>>> -	u64 mm_tlb_gen =3D atomic64_read(&loaded_mm->context.tlb_gen);
>>> 	u64 local_tlb_gen =3D =
this_cpu_read(cpu_tlbstate.ctxs[loaded_mm_asid].tlb_gen);
>>> 	bool local =3D smp_processor_id() =3D=3D f->initiating_cpu;
>>> 	unsigned long nr_invalidate =3D 0;
>>> +	u64 mm_tlb_gen;
>>>=20
>>> 	/* This code cannot presently handle being reentered. */
>>> 	VM_WARN_ON(!irqs_disabled());
>>> @@ -771,6 +771,22 @@ static void flush_tlb_func(void *info)
>>> 		return;
>>> 	}
>>>=20
>>> +	if (f->new_tlb_gen <=3D local_tlb_gen) {
>>> +		/*
>>> +		 * The TLB is already up to date in respect to =
f->new_tlb_gen.
>>> +		 * While the core might be still behind mm_tlb_gen, =
checking
>>> +		 * mm_tlb_gen unnecessarily would have negative caching =
effects
>>> +		 * so avoid it.
>>> +		 */
>>> +		return;
>>> +	}
>>> +
>>> +	/*
>>> +	 * Defer mm_tlb_gen reading as long as possible to avoid cache
>>> +	 * contention.
>>> +	 */
>>> +	mm_tlb_gen =3D atomic64_read(&loaded_mm->context.tlb_gen);
>>> +
>>> 	if (unlikely(local_tlb_gen =3D=3D mm_tlb_gen)) {
>>> 		/*
>>> 		 * There's nothing to do: we're already up to date.  =
This can
>>> --=20
>>> 2.25.1
>>=20
>> I'm sorry, but bisection and reversion show that this commit,
>> aa44284960d550eb4d8614afdffebc68a432a9b4 in current linux-next,
>> is responsible for the "internal compiler error: Segmentation fault"s
>> I get when running kernel builds on tmpfs in 1G memory, lots of =
swapping.
>>=20
>> That tmpfs is using huge pages as much as it can, so splitting and
>> collapsing, compaction and page migration entailed, in case that's
>> relevant (maybe this commit is perfect, but there's a TLB flushing
>> bug over there in mm which this commit just exposes).
>>=20
>> Whether those segfaults happen without the huge page element,
>> I have not done enough testing to tell - there are other bugs with
>> swapping in current linux-next, indeed, I wouldn't even have found
>> this one, if I hadn't already been on a bisection for another bug,
>> and got thrown off course by these segfaults.
>>=20
>> I hope that you can work out what might be wrong with this,
>> but meantime I think it needs to be reverted.
>=20
> I find it always surprising how trivial one liners fail.
>=20
> As you probably know, debugging these kind of things is hard. I see =
two
> possible cases:
>=20
> 1. The failure is directly related to this optimization. The immediate
> suspect in my mind is something to do with PCID/ASID.
>=20
> 2. The failure is due to another bug that was papered by =E2=80=9Cenough=
=E2=80=9D TLB
> flushes.
>=20
> I will look into the code. But if it is possible, it would be helpful =
to
> know whether you get the failure with the =E2=80=9Cnopcid=E2=80=9D =
kernel parameter. If it
> passes, it wouldn=E2=80=99t say much, but if it fails, I think (2) is =
more likely.
>=20
> Not arguing about a revert, but, in some way, if the test fails, it =
can
> indicate that the optimization =E2=80=9Cworks=E2=80=9D=E2=80=A6
>=20
> I=E2=80=99ll put some time to look deeper into the code, but it would =
be very
> helpful if you can let me know what happens with nopcid.

Actually, only using =E2=80=9Cnopcid=E2=80=9D would most likely make it =
go away if we have
PTI enabled. So to get a good indication, a check whether it reproduces =
with
=E2=80=9Cnopti=E2=80=9D and =E2=80=9Cnopcid=E2=80=9D is needed.

I don=E2=80=99t have a better answer yet. Still trying to see what might =
have gone
wrong.=