From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <owner-linux-mm@kvack.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by smtp.lore.kernel.org (Postfix) with ESMTP id ACE60C433F5
	for <linux-mm@archiver.kernel.org>; Sun,  9 Jan 2022 00:28:10 +0000 (UTC)
Received: by kanga.kvack.org (Postfix)
	id AAA4C6B0071; Sat,  8 Jan 2022 19:28:09 -0500 (EST)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id A59326B0073; Sat,  8 Jan 2022 19:28:09 -0500 (EST)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id 8F9776B0074; Sat,  8 Jan 2022 19:28:09 -0500 (EST)
X-Delivered-To: linux-mm@kvack.org
Received: from forelay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13])
	by kanga.kvack.org (Postfix) with ESMTP id 80BBC6B0071
	for <linux-mm@kvack.org>; Sat,  8 Jan 2022 19:28:09 -0500 (EST)
Received: from smtpin17.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251])
	by forelay05.hostedemail.com (Postfix) with ESMTP id 200F818215475
	for <linux-mm@kvack.org>; Sun,  9 Jan 2022 00:28:09 +0000 (UTC)
X-FDA: 79008861498.17.EEEC12B
Received: from mail-ed1-f52.google.com (mail-ed1-f52.google.com [209.85.208.52])
	by imf21.hostedemail.com (Postfix) with ESMTP id A24721C0008
	for <linux-mm@kvack.org>; Sun,  9 Jan 2022 00:28:08 +0000 (UTC)
Received: by mail-ed1-f52.google.com with SMTP id a18so36865526edj.7
        for <linux-mm@kvack.org>; Sat, 08 Jan 2022 16:28:08 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=linux-foundation.org; s=google;
        h=mime-version:references:in-reply-to:from:date:message-id:subject:to
         :cc;
        bh=oGsOLIK808uXKoYEXm+9M1/mxGIpc+NCRyWBzRfQguU=;
        b=GncIXPQZdIf1s30CtvXDpImy0aM6Y8UDX4Prf9G0CuasfJL5i/sqgCYEx2ov5zFK7w
         iPw7dihpr73TH43khvz5Nm4adkZoRk3EKrdA77y5ZkWlYf0F3T3S5jUt6sOxlz2J4rXc
         ed5o7t90vs8GITG/p32k1EPLErcxH74XHsri8=
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=x-gm-message-state:mime-version:references:in-reply-to:from:date
         :message-id:subject:to:cc;
        bh=oGsOLIK808uXKoYEXm+9M1/mxGIpc+NCRyWBzRfQguU=;
        b=MR4pWeGf+HR6m95Z/8Y8JOp4UT0bPuPOgB9u6XU+GtkrH+bXHvFvgoeyeUQdWdvXP2
         DbCxX5Z/UQD/v8JjShf8RjzYdiT036bGG7ySbyp9aIVR4vqFFCmMLiIOWvrmVswugs4n
         oU9ZNZp/F665COM3SKZKkugR3LdePDsU5nAc8k2rmJuNtMHcaym0UIgzW/OmSgjviiK1
         VKyCtNpnJbuoMrgBnb6jerMjHI6/S6ypGE0+Q+PL5E1B2CEHIUzb07G3hve7RCz2Yv6x
         D30L1hBnuF/FwsulSh0+picTL0ALF0DGgtPYGiMKrzQLwEAalqD+7G0+kkwYikDUEEE7
         cJ1g==
X-Gm-Message-State: AOAM533FWUgzMNOqdlPnwvQNOIWUMpe3cwzcYM0kqVITpopJKpdOpMeL
	v6OaVXgo4vSjaDC1H6Q5eTK6GSQzWQVyp/x2
X-Google-Smtp-Source: ABdhPJyrs1ZaoleF6JBn3gaTFWkA33W4O/cBm+p5VHZlxpjQTbTm+hDyKe8IKHFAAI4Pf2l1K2Mdzg==
X-Received: by 2002:a05:6402:c92:: with SMTP id cm18mr5413379edb.295.1641688086717;
        Sat, 08 Jan 2022 16:28:06 -0800 (PST)
Received: from mail-wr1-f53.google.com (mail-wr1-f53.google.com. [209.85.221.53])
        by smtp.gmail.com with ESMTPSA id hc19sm162865ejc.1.2022.01.08.16.28.05
        for <linux-mm@kvack.org>
        (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128);
        Sat, 08 Jan 2022 16:28:05 -0800 (PST)
Received: by mail-wr1-f53.google.com with SMTP id r9so17308902wrg.0
        for <linux-mm@kvack.org>; Sat, 08 Jan 2022 16:28:05 -0800 (PST)
X-Received: by 2002:a5d:6c68:: with SMTP id r8mr58307877wrz.281.1641688084877;
 Sat, 08 Jan 2022 16:28:04 -0800 (PST)
MIME-Version: 1.0
References: <cover.1641659630.git.luto@kernel.org> <7c9c388c388df8e88bb5d14828053ac0cb11cf69.1641659630.git.luto@kernel.org>
 <CAHk-=wj4LZaFB5HjZmzf7xLFSCcQri-WWqOEJHwQg0QmPRSdQA@mail.gmail.com> <3586aa63-2dd2-4569-b9b9-f51080962ff2@www.fastmail.com>
In-Reply-To: <3586aa63-2dd2-4569-b9b9-f51080962ff2@www.fastmail.com>
From: Linus Torvalds <torvalds@linux-foundation.org>
Date: Sat, 8 Jan 2022 16:27:48 -0800
X-Gmail-Original-Message-ID: <CAHk-=wj=0NJ_pxDBEz63WE4VTN1XF+9uMRy89nuVpqSRKB1k-w@mail.gmail.com>
Message-ID: <CAHk-=wj=0NJ_pxDBEz63WE4VTN1XF+9uMRy89nuVpqSRKB1k-w@mail.gmail.com>
Subject: Re: [PATCH 16/23] sched: Use lightweight hazard pointers to grab lazy mms
To: Andy Lutomirski <luto@kernel.org>, Will Deacon <will@kernel.org>, 
	Catalin Marinas <catalin.marinas@arm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>, Linux-MM <linux-mm@kvack.org>, 
	Nicholas Piggin <npiggin@gmail.com>, Anton Blanchard <anton@ozlabs.org>, 
	Benjamin Herrenschmidt <benh@kernel.crashing.org>, Paul Mackerras <paulus@ozlabs.org>, 
	Randy Dunlap <rdunlap@infradead.org>, linux-arch <linux-arch@vger.kernel.org>, 
	"the arch/x86 maintainers" <x86@kernel.org>, Rik van Riel <riel@surriel.com>, Dave Hansen <dave.hansen@intel.com>, 
	"Peter Zijlstra (Intel)" <peterz@infradead.org>, Nadav Amit <nadav.amit@gmail.com>, 
	Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Content-Type: multipart/alternative; boundary="00000000000021a7ec05d51b4bec"
X-Rspamd-Queue-Id: A24721C0008
X-Stat-Signature: 9bze7da34nz4xtzjseuhtioxyp9xbkab
Authentication-Results: imf21.hostedemail.com;
	dkim=pass header.d=linux-foundation.org header.s=google header.b=GncIXPQZ;
	dmarc=none;
	spf=pass (imf21.hostedemail.com: domain of torvalds@linuxfoundation.org designates 209.85.208.52 as permitted sender) smtp.mailfrom=torvalds@linuxfoundation.org
X-Rspamd-Server: rspam02
X-HE-Tag: 1641688088-984009
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>

--00000000000021a7ec05d51b4bec
Content-Type: text/plain; charset="UTF-8"

On Sat, Jan 8, 2022 at 2:04 PM Andy Lutomirski <luto@kernel.org> wrote:
>
> So this requires that all architectures actually walk all relevant
> CPUs to see if an IPI is needed and send that IPI. On architectures
> that actually need an IPI anyway (x86 bare metal, powerpc (I think)
> and others, fine. But on architectures with a broadcast-to-all-CPUs
> flush (ARM64 IIUC), then the extra IPI will be much much slower than a
> simple load-acquire in a loop.

... hmm. How about a hybrid scheme?

 (a) architectures that already require that IPI anyway for TLB
invalidation (ie x86, but others too), just make the rule be that the TLB
flush by exit_mmap() get rid of any lazy TLB mm references. Which they
already do.

 (b) architectures like arm64 that do hw-assisted TLB shootdown will have
an ASID allocator model, and what you do is to use that to either
    (b') increment/decrement the mm_count at mm ASID allocation/freeing time
    (b'') use the existing ASID tracking data to find the CPU's that have
that ASID

 (c) can you really imagine hardware TLB shootdown without ASID allocation?
That doesn't seem to make sense. But if it exists, maybe that kind of crazy
case would do the percpu array walking.

(Honesty in advertising: I don't know the arm64 ASID code - I used to know
the old alpha version I wrote in a previous lifetime - but afaik any ASID
allocator has to be able to track CPU's that have a particular ASID in use
and be able to invalidate it).

Hmm. The x86 maintainers are on this thread, but they aren't even the
problem. Adding Catalin and Will to this, I think they should know if/how
this would fit with the arm64 ASID allocator.

Will/Catalin, background here:


https://lore.kernel.org/all/CAHk-=wj4LZaFB5HjZmzf7xLFSCcQri-WWqOEJHwQg0QmPRSdQA@mail.gmail.com/

for my objection to that special "keep non-refcounted magic per-cpu pointer
to lazy tlb mm".

           Linus

--00000000000021a7ec05d51b4bec
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><br><br>On Sat, Jan 8, 2022 at 2:04 PM Andy Lutomirski &lt=
;<a href=3D"mailto:luto@kernel.org">luto@kernel.org</a>&gt; wrote:<br>&gt;<=
br>&gt; So this requires that all architectures actually walk all relevant =
<br>&gt; CPUs to see if an IPI is needed and send that IPI. On architecture=
s<br>&gt; that actually need an IPI anyway (x86 bare metal, powerpc (I thin=
k)<br>&gt; and others, fine. But on architectures with a broadcast-to-all-C=
PUs<br>&gt; flush (ARM64 IIUC), then the extra IPI will be much much slower=
 than a<br>&gt; simple load-acquire in a loop.<div><br><div>... hmm. How ab=
out a hybrid scheme?</div><div><br></div><div>=C2=A0(a) architectures that =
already require that IPI anyway for TLB invalidation (ie x86, but others to=
o), just make the rule be that the TLB flush by exit_mmap() get rid of any =
lazy TLB mm references. Which they already do.</div></div><div><br></div><d=
iv>=C2=A0(b) architectures like arm64 that do hw-assisted TLB shootdown wil=
l have an ASID allocator model, and what you do is to use that to either</d=
iv><div>=C2=A0 =C2=A0 (b&#39;) increment/decrement the mm_count at mm ASID =
allocation/freeing time</div><div>=C2=A0 =C2=A0 (b&#39;&#39;) use the exist=
ing ASID tracking data to find the CPU&#39;s that have that ASID</div><div>=
<br></div><div>=C2=A0(c) can you really imagine hardware TLB shootdown with=
out ASID allocation? That doesn&#39;t seem to make sense. But if it exists,=
 maybe that kind of crazy case would do the percpu array walking.</div><div=
><br></div><div>(Honesty in advertising: I don&#39;t know the arm64 ASID co=
de - I used to know the old alpha version I wrote in a previous lifetime - =
but afaik any ASID allocator has to be able to track CPU&#39;s that have a =
particular ASID in use and be able to invalidate it).</div><div><br></div><=
div>Hmm. The x86 maintainers are on this thread, but they aren&#39;t even t=
he problem. Adding Catalin and Will to this, I think they should know if/ho=
w this would fit with the arm64 ASID allocator.</div><div><br></div><div>Wi=
ll/Catalin, background here:</div><div><br></div><div>=C2=A0 =C2=A0<a href=
=3D"https://lore.kernel.org/all/CAHk-=3Dwj4LZaFB5HjZmzf7xLFSCcQri-WWqOEJHwQ=
g0QmPRSdQA@mail.gmail.com/">https://lore.kernel.org/all/CAHk-=3Dwj4LZaFB5Hj=
Zmzf7xLFSCcQri-WWqOEJHwQg0QmPRSdQA@mail.gmail.com/</a><br></div><div><br></=
div><div>for my objection to that special &quot;keep non-refcounted=C2=A0ma=
gic per-cpu pointer to lazy tlb mm&quot;.</div><div><br></div><div>=C2=A0 =
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0Linus</div></div>

--00000000000021a7ec05d51b4bec--