From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <owner-linux-mm@kvack.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by smtp.lore.kernel.org (Postfix) with ESMTP id E4AE4C77B7A
	for <linux-mm@archiver.kernel.org>; Wed, 17 May 2023 01:23:44 +0000 (UTC)
Received: by kanga.kvack.org (Postfix)
	id 24252900004; Tue, 16 May 2023 21:23:44 -0400 (EDT)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id 1CB76900003; Tue, 16 May 2023 21:23:44 -0400 (EDT)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id 044D4900004; Tue, 16 May 2023 21:23:43 -0400 (EDT)
X-Delivered-To: linux-mm@kvack.org
Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10])
	by kanga.kvack.org (Postfix) with ESMTP id E345A900003
	for <linux-mm@kvack.org>; Tue, 16 May 2023 21:23:43 -0400 (EDT)
Received: from smtpin22.hostedemail.com (a10.router.float.18 [10.200.18.1])
	by unirelay05.hostedemail.com (Postfix) with ESMTP id A175840442
	for <linux-mm@kvack.org>; Wed, 17 May 2023 01:23:43 +0000 (UTC)
X-FDA: 80797999926.22.E0E2A69
Received: from mail-pg1-f171.google.com (mail-pg1-f171.google.com [209.85.215.171])
	by imf15.hostedemail.com (Postfix) with ESMTP id AA1A1A0008
	for <linux-mm@kvack.org>; Wed, 17 May 2023 01:23:41 +0000 (UTC)
Authentication-Results: imf15.hostedemail.com;
	dkim=pass header.d=gmail.com header.s=20221208 header.b=QALZD5qC;
	spf=pass (imf15.hostedemail.com: domain of nadav.amit@gmail.com designates 209.85.215.171 as permitted sender) smtp.mailfrom=nadav.amit@gmail.com;
	dmarc=pass (policy=none) header.from=gmail.com
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com;
	s=arc-20220608; t=1684286621;
	h=from:from:sender:reply-to:subject:subject:date:date:
	 message-id:message-id:to:to:cc:cc:mime-version:mime-version:
	 content-type:content-type:
	 content-transfer-encoding:content-transfer-encoding:
	 in-reply-to:in-reply-to:references:references:dkim-signature;
	bh=jDOz4LjnL6MgfUp+54QuXwXQoCJOhkda16ICsj4M2RE=;
	b=uJdVY/wvx2jd9uIyumNhT/NaQbknDj+XpfC3/G/lmh9lY7Qw2mcTR5syX0pxE7Dba++CK2
	dqvt3JgE0iyygyXIQol2t5zuoJ+JjF+7StCcPF6IshOhgRBtqy//xrjEI7UxFFmAB0dPWY
	MGeGEZFDvJlsXncBXMGtY5qfj0+Dfyg=
ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1684286621; a=rsa-sha256;
	cv=none;
	b=jbzUsKiwqdXaN17T777dq5M3QbkEqwHCjLNoi7gFfBwLwZybmScoeR18ZQjCd41xYvWgfK
	x6Iu06cBixknjpBJjugdSW2moybGn2f2Vp8VWhvrO31ByhadhNNchMMZznXx4K2mqABCLh
	Un/R1SEfnKrD0FY9FlTPDeWXw5Fw+ho=
ARC-Authentication-Results: i=1;
	imf15.hostedemail.com;
	dkim=pass header.d=gmail.com header.s=20221208 header.b=QALZD5qC;
	spf=pass (imf15.hostedemail.com: domain of nadav.amit@gmail.com designates 209.85.215.171 as permitted sender) smtp.mailfrom=nadav.amit@gmail.com;
	dmarc=pass (policy=none) header.from=gmail.com
Received: by mail-pg1-f171.google.com with SMTP id 41be03b00d2f7-52c759b7d45so91879a12.3
        for <linux-mm@kvack.org>; Tue, 16 May 2023 18:23:41 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20221208; t=1684286620; x=1686878620;
        h=to:references:message-id:content-transfer-encoding:cc:date
         :in-reply-to:from:subject:mime-version:from:to:cc:subject:date
         :message-id:reply-to;
        bh=jDOz4LjnL6MgfUp+54QuXwXQoCJOhkda16ICsj4M2RE=;
        b=QALZD5qCCjPo2tO5JryBa5fxXuRbgZedIOp5vsR0o9jxkW38NCFJS0ZeyzIVFjY6Dy
         J4o8uRp/A8ZOXD7JO4VBvdyKuQC22o82yJ+YK5NsEnu/ALD+uR49PC9Gql/r/qbBt+mo
         w87TOy8+Z+6Bm7HS1n03j8+JPG8ETsGU/dOo83zbToXbGQz4EiLng84sSnAj0ooEyLtx
         Gk/0I7HbeIm5TpHDr1VEFQ39OEwd/7S0baTH5lr6uBBIouefcPNLvpaSvmRkGynq0bSA
         b9AziYQf8v7AhBct4jcNcsKQJhRfHPCaE4Wfm+rblZDIlvPtavPiCAvOfU/cCJZWw7Ri
         JLGw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20221208; t=1684286620; x=1686878620;
        h=to:references:message-id:content-transfer-encoding:cc:date
         :in-reply-to:from:subject:mime-version:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=jDOz4LjnL6MgfUp+54QuXwXQoCJOhkda16ICsj4M2RE=;
        b=ejsRxafy61TcFJD1KmvYrYGACpSazcTQfkDhu1KamUlV7TUDciSiAbUN/smnry14+2
         bJAukExf7n+MxW2b+8Z3sErPloupg+RP/0jmyeSLa9yEQRTnI4+1WhsHIOOuXYCqOHbu
         bw5DHaQUq2/C9B45u3Fxd7hcM/eEOGd0Bn/ji6fMd2S+w4C4kU4g1GtHu6+HfG35rMkw
         IiDg5NGs0r4y0jBgGaXL+rdR2Q9XqJuS8ZgDRwGeluDQV189PSGZP08g8MH+m4bg+jqO
         HQFwib6033pRd23y9Lfk1bU3CxhU2RvehBzN3llhPe4l+lzATprCrqrUCDpNn3tC5QC7
         sOxA==
X-Gm-Message-State: AC+VfDw5HHFrzNI6n0pcgI4Oh0ksxDizoU9kHC+1QzFzHupwKxhNGZg8
	2BjdNlD4PBtY5Gx+Jgceat0=
X-Google-Smtp-Source: ACHHUZ7VdgbvPt5VtSt1Q/5OZxK/zmbD5FVHjOwqkY0mk5XFy3E7hyk370CAF0GRkzmBhcOZNwYF3g==
X-Received: by 2002:a17:903:294c:b0:1ab:1b8:8a5f with SMTP id li12-20020a170903294c00b001ab01b88a5fmr39014802plb.33.1684286619842;
        Tue, 16 May 2023 18:23:39 -0700 (PDT)
Received: from smtpclient.apple (c-24-6-216-183.hsd1.ca.comcast.net. [24.6.216.183])
        by smtp.gmail.com with ESMTPSA id h5-20020a170902f54500b001ae268978cfsm3876751plf.259.2023.05.16.18.23.38
        (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128);
        Tue, 16 May 2023 18:23:39 -0700 (PDT)
Content-Type: text/plain;
	charset=utf-8
Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3731.500.231\))
Subject: Re: Excessive TLB flush ranges
From: Nadav Amit <nadav.amit@gmail.com>
In-Reply-To: <87353v7qms.ffs@tglx>
Date: Tue, 16 May 2023 18:23:27 -0700
Cc: Uladzislau Rezki <urezki@gmail.com>,
 "Russell King (Oracle)" <linux@armlinux.org.uk>,
 Andrew Morton <akpm@linux-foundation.org>,
 linux-mm <linux-mm@kvack.org>,
 Christoph Hellwig <hch@lst.de>,
 Lorenzo Stoakes <lstoakes@gmail.com>,
 Peter Zijlstra <peterz@infradead.org>,
 Baoquan He <bhe@redhat.com>,
 John Ogness <jogness@linutronix.de>,
 linux-arm-kernel@lists.infradead.org,
 Mark Rutland <mark.rutland@arm.com>,
 Marc Zyngier <maz@kernel.org>,
 x86@kernel.org
Content-Transfer-Encoding: quoted-printable
Message-Id: <ABA0B923-FD56-4787-9ED1-994BEA13C496@gmail.com>
References: <87a5y5a6kj.ffs@tglx> <ZGJk3w4gUbpLl2tp@shell.armlinux.org.uk>
 <87353x9y3l.ffs@tglx> <87zg658fla.ffs@tglx>
 <ZGKkmUWzCG1SjFhL@shell.armlinux.org.uk> <87r0rg93z5.ffs@tglx>
 <ZGM8lKpX2zxbNRl/@shell.armlinux.org.uk> <87cz308y3s.ffs@tglx>
 <ZGNDXm54ieAOp77T@shell.armlinux.org.uk> <87y1lo7a0z.ffs@tglx>
 <ZGOIPDsTdL2IazdZ@pc636> <87o7mk733x.ffs@tglx>
 <7ED917BC-420F-47D4-8956-8984205A75F0@gmail.com> <87bkik6pin.ffs@tglx>
 <87353v7qms.ffs@tglx>
To: Thomas Gleixner <tglx@linutronix.de>
X-Mailer: Apple Mail (2.3731.500.231)
X-Stat-Signature: yuuezbhc6hg11kcj3bkom3cf34nmp17x
X-Rspamd-Server: rspam04
X-Rspamd-Queue-Id: AA1A1A0008
X-Rspam-User: 
X-HE-Tag: 1684286621-853671
X-HE-Meta: U2FsdGVkX18BPuOdcfmmVySq2190+vnpBtLS6LUxtvuhMQSzaGlI2ZMLW1PlLwnSLfv0Hjtv/c7Xg4FOqE+Iuvi9KokYQpSBV33JUHjblfJxpGTDr6Qk4rYLUJElvK/b+XxN+OoBR5q06VJ+bsTWz8m/DJhXTP/pW6Z2iT6jmhotTb8q8Vb/gZzkiAa0acHqX46UV1v/pPiV50RxMyx6nXTuDBkfg+FZdKsNUKrNNXDKK0fWxa5GPVlaCDSaP6Na4hBZGbTiCzpAielWxJm6AfyOtM7Th7aSp1mJZB8BOwzSqJpETf5/9gC4G0BLJdsPfOAjKUp20wDKDnr8vldE8kp7Ca+V/0cmwXpVRZ1BUx2+grocHhizrF26Y52RFNjoWpIC0rGRqDjwHqItBlNrNXIk9TmwULStemOyhtqzu2YJKq7d4ZbnvxsbuIKDFWTxVkiimR/xfIanTUgsE73xif/djrUTRCdyFlbVQ3YsIiMq+LDMxZgWifvAaaqH2PmzjtJ8vEGc9ahrXGUq/1w96CEZBoJHt8N2bXfK8og5j2G8l2Bf2TIWE27vDVf44B3b7at0yPVkCNeSnXQsc3HuWrKOJDEIF6rCBFQ/SKTSV2RVhjytT/oVK652ApKYHCStZAuKthrn7oS3Ihjc27Sa/vJfdUR0WOYFJ0uvErnD35IUSVaKX47yaIUXev0H26ojnjHLWsXpeVYj8RYSKAfDJLfzMgVmEtEAEamWcJ96+j4slkaY4C7mpfNL5KznBmo2hajPsImd0sj2X+7AY7IZ+gzyE0aZfiFhdcUN5U2K16Alz7CcvSO6fHxAo384YtERgyN+qP2s54fWQvMa/6jwAjHRzcO24nHI+TS3H6fhI0WZFm87gIQbirgkYFhmHsZPN0S/3lpBmUkc34o2P8b+FpTQhzvFhFSTvwa6EDzzzYezpnOr1beFyCShKp1vukaZiLegQ5FbzQZvxoA7v8H
 70x/Ymsu
 mIirUpgzQgD0PGbA6MxrsretqP8CEphHShH+VqyyO1NmAopBOWYeFceKGK99RBFzfW26DTqdL98ARuIRKcRV1zWpZDHlFH89YDo44+GxE9iDmpdfIhk4oj4G5xkA9EpEdGC/7BaTEUOiByHY7jqOcjFUWU8/KpOLgtu5IvbRol1EsEfgCrkHU1I2mqyP6Tklik9dqHpRW7cfvyx6xx3O5c3I/4h/ON+xg56dSs63vOq1rba4VBgYtA3fag8WKEPqXwT6uAP3SUFAw7gK2v3zT5RWUWy1mzIiabbXDWKZjDZYcWyGyDvrIYvJY/cfvcYsl8bw/mTFYN/NBmxIaUQRsH24i2enlrDsnJivE3/XpOmaxbG5kG0Gqw7JbOazjdCmDWj7kFd4IPQY6+CvQ9zlzyldgaspBOuj6mhPu9RKbLRSkM24bi0dUACXd7B33JRJdedQYe+NLj/L7s829BoLyc1vIjAh4mw4XaqsAAUJt+xJnSms=
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000002, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>


> On May 16, 2023, at 5:23 PM, Thomas Gleixner <tglx@linutronix.de> =
wrote:
>=20
> On Tue, May 16 2023 at 21:32, Thomas Gleixner wrote:
>> On Tue, May 16 2023 at 10:56, Nadav Amit wrote:
>>>> On May 16, 2023, at 7:38 AM, Thomas Gleixner <tglx@linutronix.de> =
wrote:
>>>>=20
>>>> There is a world outside of x86, but even on x86 it's borderline =
silly
>>>> to take the whole TLB out when you can flush 3 TLB entries one by =
one
>>>> with exactly the same number of IPIs, i.e. _one_. No?
>>>=20
>>> I just want to re-raise points that were made in the past, including =
in
>>> the discussion that I sent before and match my experience.
>>>=20
>>> Feel free to reject them, but I think you should not ignore them.
>>=20
>> I'm not ignoring them and I'm well aware of these issues. No need to
>> repeat them over and over. I'm old but not senile yet.

Thomas, no disrespect was intended. I initially just sent the link and I
had a sense (based on my past experience) that nobody clicked on it.

>=20
> Just to be clear. This works the other way round too.
>=20
> It makes a whole lot of a difference whether you do 5 IPIs in a row
> which all need to get a cache line updated or if you have _one_ which
> needs a couple of cache lines updated.

Obviously, if the question is 5 IPIs or 1 IPI with more flushing data,
the 1 IPI wins. The question I was focusing on is whether 1 IPI with
potentially global flush or detailed list of ranges to flush. =20

>=20
> INVLPG is not serializing so the CPU can pull in the next required =
cache
> line(s) on the VA list during that.

Indeed, but ChatGPT says (yes, I see you making fun of me already):
=E2=80=9Chowever, this doesn't mean INVLPG has no impact on the =
pipeline. INVLPG
can cause a pipeline stall because the TLB entry invalidation must be
completed before subsequent instructions that might rely on the TLB can
be executed correctly.=E2=80=9D

So I am not sure that your claim is exactly correct.

> These cache lines are _not_
> contended at that point because _all_ of these data structures are not
> longer globally accessible (mis-speculation aside) and therefore not
> exclusive (misalignment aside, but you have to prove that this is an
> issue).

This is not entirely true. Indeed whether you have 1 remote core or N
remote core is not a whole issue (putting aside NUMA). But you will get
first a snoop to the initiator cache by the responding core, and then,
after the TLB invalidation is completed, an RFO by the initiator once
it writes to the cache again. If the invalidation data is on the stack
(as you did), this is even more likely to happen shortly after.

>=20
> So just dismissing this on 10 years old experience is not really
> helpful, though I'm happy to confirm your points once I had the time =
and
> opportunity to actually run real testing over it, unless you beat me =
to
> it.

I really don=E2=80=99t know what =E2=80=9Cdismissing=E2=80=9D you are =
talking about. I do have
relatively recent experience with the overhead of caching effects on
TLB shootdown time. It can become very apparent. You can find some
numbers in, for instance, the patch of mine I quoted in my previous
email.

There are additional opportunities to reduce the caching effect for
x86, such as combining the SMP-code metadata with the TLB-invalidation
metadata (which is out of the scope) that I saw having performance
benefit. That=E2=80=99s all to say that caching effect is not something =
to
be considered obsolete.

>=20
> What I can confirm is that it solves a real world problem on !x86
> machines for the pathological case at hand
>=20
>   On the affected contemporary ARM32 machine, which does not require
>   IPIs, the selective flush is way better than:
>=20
>   - the silly 1.G range one page by one flush (which is silly on its
>     own as there is no range check)
>=20
>   - a full tlb flush just for 3 pages, which is the same on x86 albeit
>     the flush range is ~64GB there.
>=20
> The point is that the generic vmalloc code is making assumptions which
> are x86 centric on not even necessarily true on x86.
>=20
> Whether or not this is benefitial on x86 that's a completey separate
> debate.

I fully understand that if you reduce multiple TLB shootdowns (IPI-wise)
into 1, it is (pretty much) all benefit and there is no tradeoff. I was
focusing on the question of whether it is beneficial also to do precise
TLB flushing, and the tradeoff there is less clear (especially that the
kernel uses 2MB pages).

My experience with non-IPI based TLB invalidations is more limited. IIUC
the usage model is that the TLB shootdowns should be invoked ASAP
(perhaps each range can be batched, but there is no sense of batching
multiple ranges), and then later you would issue some barrier to ensure
prior TLB shootdown invocations have been completed.

If that is the (use) case, I am not sure the abstraction you used in
your prototype is the best one.


> There is also a debate required whether a wholesale "flush on _ALL_
> CPUs' is justified when some of those CPUs are completely isolated and
> have absolutely no chance to be affected by that. This process bound
> seccomp/BPF muck clearly does not justify to kick isolated CPUs out of
> their computation in user space just because=E2=80=A6

I hope you would excuse my ignorance (I am sure you won=E2=80=99t), but =
isn=E2=80=99t
the seccomp/BPF VMAP ranges are mapped on all processes (considering
PTI of course)? Are you suggesting you want a per-process kernel
address space? (which can make senes, I guess)