From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6B7F2E77198 for ; Mon, 6 Jan 2025 14:32:36 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id AA87D6B008C; Mon, 6 Jan 2025 09:32:35 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id A57786B0092; Mon, 6 Jan 2025 09:32:35 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 947926B0093; Mon, 6 Jan 2025 09:32:35 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 789746B008C for ; Mon, 6 Jan 2025 09:32:35 -0500 (EST) Received: from smtpin26.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 00CE516161B for ; Mon, 6 Jan 2025 14:32:34 +0000 (UTC) X-FDA: 82977267870.26.270D8F6 Received: from shelob.surriel.com (shelob.surriel.com [96.67.55.147]) by imf27.hostedemail.com (Postfix) with ESMTP id 3827740006 for ; Mon, 6 Jan 2025 14:32:33 +0000 (UTC) Authentication-Results: imf27.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf27.hostedemail.com: domain of riel@shelob.surriel.com designates 96.67.55.147 as permitted sender) smtp.mailfrom=riel@shelob.surriel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1736173953; a=rsa-sha256; cv=none; b=yvuvs7MWSuRnKrC4LLy61UwHX3YqGQSZqrqetfZom3zry6m1jWs/oi21o3aQLy5O9MstpR ioqdZCd1VBMIv9mPRuKBdxPvS5JNtA5ZBc9k5MjVsdianCaa/tAlKkYzAuj5LkW2creuOz LMXsxZqlo76qtKRa7pSFQSaVYcgwtvg= ARC-Authentication-Results: i=1; imf27.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf27.hostedemail.com: domain of riel@shelob.surriel.com designates 96.67.55.147 as permitted sender) smtp.mailfrom=riel@shelob.surriel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1736173953; h=from:from:sender:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=JOPElY+rf0R9gMVqXy+A1D/5SwYkQkkRQQP+cfeF7Gs=; b=Xsc34FGiUnEHkOL56EiegQks8sQvMz2/2EYbLvLwX5Pc1bmdStiqEfiblM8I9Mb39upVql YRELmbSVS4u5NWhqz3yOAXtdpYXjUBI+iqtqMUfLSQkjVFhIjTXRLPK8dBe40WuhXBY22r 5JDMXA7TDAQTcoHnpqa0xXniGm6ZSbo= Received: from fangorn.home.surriel.com ([10.0.13.7]) by shelob.surriel.com with esmtpsa (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.97.1) (envelope-from ) id 1tUo3a-000000000J0-2aIE; Mon, 06 Jan 2025 09:26:14 -0500 Message-ID: Subject: Re: [PATCH 09/12] x86/mm: enable broadcast TLB invalidation for multi-threaded processes From: Rik van Riel To: Jann Horn Cc: x86@kernel.org, linux-kernel@vger.kernel.org, kernel-team@meta.com, dave.hansen@linux.intel.com, luto@kernel.org, peterz@infradead.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, hpa@zytor.com, akpm@linux-foundation.org, nadav.amit@gmail.com, zhengqi.arch@bytedance.com, linux-mm@kvack.org Date: Mon, 06 Jan 2025 09:26:14 -0500 In-Reply-To: References: <20241230175550.4046587-1-riel@surriel.com> <20241230175550.4046587-10-riel@surriel.com> <96d1b007b3b88c43feac58d2646d675b94daf1fb.camel@surriel.com> Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable User-Agent: Evolution 3.54.1 (3.54.1-1.fc41) MIME-Version: 1.0 X-Stat-Signature: oah7f7i34hm4zs6jjrfxi63k8fswzygd X-Rspam-User: X-Rspamd-Queue-Id: 3827740006 X-Rspamd-Server: rspam08 X-HE-Tag: 1736173953-42420 X-HE-Meta: U2FsdGVkX1+FcmbX1DMcHj9tL31KXn7l75kSWLh3aMpeMFtSNkaw7BK5AfvI47jUFcggZV5fWI6FudHVwz4SNpQ5i+FnxJVEONy+ETu3F+H7wQS4eZpf/8cQRxAbAehyDQbUBDTZPa8fcEsVxtBHIw2DrcZAiXx2tWhQwK36TsPXEsiL4VHbZvP9QGpFqB28/nh4yqGQijON9UXMkK3xGhHffevMps3wwqGvkySUyhY0LSiZp/i2EsiEYOGC5DILXPOYOI4Ix8cf55E+a4GU3sDk1ZXu5b9Ybr3TpycCF3r+Dz3x0ds5HRAu6oXAOAytpiQXAwqYPvoTSJfNYvsicoW6WNwPuEBc6hlZ29ee3uAjoo+vFMxokHbasVa/APOGIb4VJ++0iH/O2UxIyJqI8pUaz8T7O2OReZ4tmlrdDzgFIt4oY8fFtpdUyCq2kzNmxyM6AlYQS3r16lmkPa2L17Wost2wxsl5/git3gMDWVjEqRWqxYCrdmDvJTqDDKv/liY3QKxZMUz+DAddgs7Xvd8X/f3wDIMkXedJtct075TVFaP5aR7UjXlAqUjym9FsBmss2S4o0N5LVZB7txbHXWOZwUdvzRfeazZpCWgFebxrsozOIAn/VTlIsjBhrT15dUebSXNOAuLmSbAsicbEkhe/z2khrv0B+NzPawnD++QXgiuEt5jWZDXjVjv1Z29yiljprIIqY8HkuyXgnGLqMuSU2u9j1iPG1OM5u9ajluR1v4Y7bI82ULbthpr4+uc90F2YYgGCk4xA8XG1k8IQXz0q9Lj2PkhU5/blD+BBLUIntGxqPObxRAPD+sYBimgmNgiDeitdaWK6fcJENqmV/op2zIWnreY5bQb04Fe4XSC5U7R9BEqNXMKQ93mXdJ4dZU5/hYgJ8Oic9nTUD2uXnbcYsZUPc7zO4BqrHUQGAGyBvTcEmEmRZjVpaYuk5uOwzsaoq9wLQE2gdTEG++M WosDqHKj w+7T0Y1NT6hv9IBHct1qGVZieI6xr2MY4E/DXIqkHteZkW28WPoj2DmVQRrxvyi8OuPDPzOVS5JdTtzFdTgObA+FPbYyetElwRDUWRE7x9MtRzUqNWRjqiHS7CxzZZADJcAUG4e9cuxr0myHCbZ5scWIQkdaJi1GozH38aFGILKT6Mdcm0Rjty7cQM6t08zcwxLBAymJCohB1XWYVYitBYMVPmgl/Fmg6l4WwF17GygMDxMqPI1J/cs/3tPaNkoeyIz1yhZCJq/BNl/fMR2qoPOy36uWyrYxrsX7wqsBhThMI1pgSGscpRoHqxRJYPoskdm1x695W4cg3LfOa7vq4QLCgAKZ+kDnY8QCjyZ6mDHB2fETGJNLufgiHM724sn5KRvZG7XzteOtLpjE= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Mon, 2025-01-06 at 14:04 +0100, Jann Horn wrote: > On Sat, Jan 4, 2025 at 3:55=E2=80=AFAM Rik van Riel wr= ote: >=20 > >=20 > > Then the only change needed to switch_mm_irqs_off > > would be to move the LOADED_MM_SWITCHING line to > > before choose_new_asid, to fully close the window. > >=20 > > Am I overlooking anything here? >=20 > I think that might require having a full memory barrier in > switch_mm_irqs_off to ensure that the write of LOADED_MM_SWITCHING > can't be reordered after reads in choose_new_asid(). Which wouldn't > be > very nice; we probably should avoid adding heavy barriers to the task > switch path... >=20 > Hmm, but I think luckily the cpumask_set_cpu() already implies a > relaxed RMW atomic, which I think on X86 is actually the same as a > sequentially consistent atomic, so as long as you put the > LOADED_MM_SWITCHING line before that, it might do the job? Maybe with > an smp_mb__after_atomic() and/or an explainer comment. > (smp_mb__after_atomic() is a no-op on x86, so maybe just a comment is > the right way. Documentation/memory-barriers.txt says > smp_mb__after_atomic() can be used together with atomic RMW bitop > functions.) >=20 That noop smp_mb__after_atomic() might be the way to go, since we do not actually use the mm_cpumask with INVLPGB, and we could conceivably skip updates to the bitmask for tasks using broadcast TLB flushing. > > >=20 > >=20 > > I'll add the READ_ONCE. > >=20 > > Will the race still exist if we wait on > > LOADED_MM_SWITCHING as proposed above? >=20 > I think so, since between reading the loaded_mm and reading the > loaded_mm_asid, the remote CPU might go through an entire task > switch. > Like: >=20 > 1. We read the loaded_mm, and see that the remote CPU is currently > running in our mm_struct. > 2. The remote CPU does a task switch to another process with a > different mm_struct. > 3. We read the loaded_mm_asid, and see an ASID that does not match > our > broadcast ASID (because the loaded ASID is not for our mm_struct). >=20 A false positive, where we do not clear the asid_transition field, and will check again in the future should be harmless, though. The worry is false negatives, where we fail to detect an out-of-sync CPU, yet still clear the asid_transition field. --=20 All Rights Reversed.