From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B2991EB64D7 for ; Tue, 20 Jun 2023 17:24:59 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 47C408D0003; Tue, 20 Jun 2023 13:24:59 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 42C358D0001; Tue, 20 Jun 2023 13:24:59 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2F5118D0003; Tue, 20 Jun 2023 13:24:59 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 1E7B68D0001 for ; Tue, 20 Jun 2023 13:24:59 -0400 (EDT) Received: from smtpin19.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id DFD1D160489 for ; Tue, 20 Jun 2023 17:24:58 +0000 (UTC) X-FDA: 80923801476.19.55683CC Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf03.hostedemail.com (Postfix) with ESMTP id A39702001F for ; Tue, 20 Jun 2023 17:24:56 +0000 (UTC) Authentication-Results: imf03.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=qVLzEw3D; spf=pass (imf03.hostedemail.com: domain of luto@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=luto@kernel.org; dmarc=pass (policy=none) header.from=kernel.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1687281896; a=rsa-sha256; cv=none; b=PR6gjnwUNfEJ0XEn8TxjtzUfuj/qleWAAMo13TqtTCTw5fb08mAnHO9ANclVYIE2vDaJWX dNHFL/g5xpGahupWR8nPT/GmgSvrg7Z5dlOvxZHsr2BcueAKYrVJZvPMyAXJnpO8jxCZEB e9atc8YS4SFXZ6WEBgSsb5UnOj1ewQ4= ARC-Authentication-Results: i=1; imf03.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=qVLzEw3D; spf=pass (imf03.hostedemail.com: domain of luto@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=luto@kernel.org; dmarc=pass (policy=none) header.from=kernel.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1687281896; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=nhHW8juKwEj0M1tOTqc/oLM1+tBfJiIhieY3J1XNK2Y=; b=4y7oB6+Wl9i174Bec1JzAl6SIa9wAOUsLkIdBg/yX9vkGiSt7Lfzw+K2EmENkACkVB4aUt 38VEY+TDmjtCdjFfPF1/6cAYwDeRKVbEqPqS939JeK6S74RoV2Gc6lqTrjD7cxDxwlNrQS 7Q07kLeLlsUGdnG3V42i0Etp0o0DpmM= Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 459E6612EE; Tue, 20 Jun 2023 17:24:55 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 03F88C433CA; Tue, 20 Jun 2023 17:24:52 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1687281894; bh=Rd2KQT7Sg/jj2vpD/+cypNQq21vO707uBhL4Wt60ONQ=; h=In-Reply-To:References:Date:From:To:Cc:Subject:From; b=qVLzEw3DYllRsW1kHmBV862G6LOZFXc4fKTQnIpsgdD3rkTdoAKKcvq1gRTVFpyy5 XbCuDXOA1g1mZTF7xKYW2DvV4WvN2nArEOEJgYcwd4dfupuNCgrJ1TcrC087JauSrD bRc/e3gFOmGsVaMU9qcLfWmy5T2ln7Jbkx9hx3XU2mUqramf+q8rMAFYMpM7+uX76N kL3uRI0aY94qeKbLwAOT9R5Vt7d0a4F/1qUypEQw4GZYFbdsYPH/DlFJM0rqVnm3R6 pFduwHV7iGgrpKWkxEPmXRIlBYyw1sOUEKglC1dG6CWAHw+KOuedQ+eu2dj3ec+4eB /QpIQ01kgaTbg== Received: from compute3.internal (compute3.nyi.internal [10.202.2.43]) by mailauth.nyi.internal (Postfix) with ESMTP id C920427C005A; Tue, 20 Jun 2023 13:24:51 -0400 (EDT) Received: from imap48 ([10.202.2.98]) by compute3.internal (MEProxy); Tue, 20 Jun 2023 13:24:51 -0400 X-ME-Sender: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvhedrgeefhedgleduucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmne cujfgurhepofgfggfkjghffffhvfevufgtgfesthhqredtreerjeenucfhrhhomhepfdet nhguhicunfhuthhomhhirhhskhhifdcuoehluhhtoheskhgvrhhnvghlrdhorhhgqeenuc ggtffrrghtthgvrhhnpeduveffvdegvdefhfegjeejlefgtdffueekudfgkeduvdetvddu ieeluefgjeeggfenucevlhhushhtvghrufhiiigvpedtnecurfgrrhgrmhepmhgrihhlfh hrohhmpegrnhguhidomhgvshhmthhprghuthhhphgvrhhsohhnrghlihhthidqudduiedu keehieefvddqvdeifeduieeitdekqdhluhhtoheppehkvghrnhgvlhdrohhrgheslhhinh hugidrlhhuthhordhush X-ME-Proxy: Feedback-ID: ieff94742:Fastmail Received: by mailuser.nyi.internal (Postfix, from userid 501) id 85C9831A0063; Tue, 20 Jun 2023 13:24:50 -0400 (EDT) X-Mailer: MessagingEngine.com Webmail Interface User-Agent: Cyrus-JMAP/3.9.0-alpha0-499-gf27bbf33e2-fm-20230619.001-gf27bbf33 Mime-Version: 1.0 Message-Id: <6145cabf-d016-4dba-b5d2-0fb793352058@app.fastmail.com> In-Reply-To: <7F566E60-C371-449B-992B-0C435AD6016B@gmail.com> References: <20230616085038.4121892-1-rppt@kernel.org> <20230616085038.4121892-3-rppt@kernel.org> <20230618080027.GA52412@kernel.org> <7F566E60-C371-449B-992B-0C435AD6016B@gmail.com> Date: Tue, 20 Jun 2023 10:24:29 -0700 From: "Andy Lutomirski" To: "Nadav Amit" , "Song Liu" Cc: "Mike Rapoport" , "Mark Rutland" , "Kees Cook" , "Linux Kernel Mailing List" , "Andrew Morton" , "Catalin Marinas" , "Christophe Leroy" , "David S. Miller" , "Dinh Nguyen" , "Heiko Carstens" , "Helge Deller" , "Huacai Chen" , "Kent Overstreet" , "Luis Chamberlain" , "Michael Ellerman" , "Naveen N. Rao" , "Palmer Dabbelt" , "Puranjay Mohan" , "Rick P Edgecombe" , "Russell King (Oracle)" , "Steven Rostedt" , "Thomas Bogendoerfer" , "Thomas Gleixner" , "Will Deacon" , bpf , linux-arm-kernel@lists.infradead.org, linux-mips@vger.kernel.org, linux-mm , linux-modules@vger.kernel.org, linux-parisc@vger.kernel.org, linux-riscv@lists.infradead.org, linux-s390 , linux-trace-kernel@vger.kernel.org, linuxppc-dev , loongarch@lists.linux.dev, netdev@vger.kernel.org, sparclinux@vger.kernel.org, "the arch/x86 maintainers" Subject: Re: [PATCH v2 02/12] mm: introduce execmem_text_alloc() and jit_text_alloc() Content-Type: text/plain;charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: A39702001F X-Stat-Signature: 8is83hn1amsqej5b1hiduah5spzbad3t X-Rspam-User: X-HE-Tag: 1687281896-585611 X-HE-Meta: U2FsdGVkX185lLEIXXicwPW6wYecHrBqFFi3zlyPos0H0R4Wn6cTVbxSkexeSi/KRHiAQ6DMh923BM4OVnrudQbn/XNb2zwKocZOAU/KRdAAoLfAlXZeWx0dK0Cl3Jg0YIWTscKJc57ZpuAu9VSpqs3rAjC7irV2bHVIuRbBjCTJzCOLNuj+Nn9a2mjckDz4L/de1Uk3mUKkymOw5NzFc48N7RCpljZ51fQ3qVkF7KfO4e+oEMX8SccjzZ2hd1esVHToPbU8vW08TzlR7uiNjIL0IMyUBq0KZhank/yASxSavPNoxTuL+oqIkpLPRhvcUgXusMahnnAepSvAfhDC5GsbRPeKFFFKRGiXhSO+cpf4quxiYId1KadDGucikMNyg21b2H4o339x3i6svw0oU2wREGnpxfQjuafi42emp70UQg0P9H2SclIzZ5FWnTYm5E9FWHHnxePe8Hwg8mOwuMxiEUaac31b/vjuw1vcCt9Lkf9kBLnppKRxXCvOs+yyW35HURoBsl8aJjE6MdfZue76WsjMScXgFC1Bnq1Wy2GkSe072vHjR0P842Bmic6N31haVC8/BB9asRECnbTBHdm+CoMQo4cAmLPPlHf+TjwjkiR6NZwPC7gPeI/iCC0Yc2ZCsYPGfL4oVvpgBM/rvDU5lttw60nS5VX/GtySB5WJqhD8cywjOvzWfnMU8ae4H4xJaeqMI8zMxg0Y6QlDq1V93GPK7CDDCXeSHaOtPdHDLcoS8bvZGGo/RB+Pj1C7xI8OfF4cUViQJEbiwRoa2D6lTnT1/yCjeQ51yVg8utDt57R4k5S673yE64/YJG4lLmTGve66ubC1q582Yfq8SU3U7L6DidKtykgMhSOHEQC6YrreAh5pm5poTlUExKgrOnrdCV3gEObNBfjVmRGw0wUFnBPp+3GOISRHKSMkN93ib000WT06f7OFeqpN8zV5PVCxRVIWj428p5c9Mmv jaKuSBm1 gknzuE9Q3e7dbUrkqzophHe5NOs3bjNai+fvXPPBE84QExdxPl/tEJ97RpXueM3yfSEho5vqmIb2tQkMZRJRqSjZYWu+hM83PCvXQOGXLF81mTXHq1CXxXqg9I3c2yayGT1ug+dBMgp77nwoAQQlmeEGoWe99UKpEeNw7qiaQsSnPOzDuvI2tK7ibY1LEn1DWf+pfW2vOhIw6dj3sFKYjq2MR0Dqb6QyYVUA7aJaFj8O10CGGREW8aOlQqxloxITh8m/Z535ns+1ADejr5FFuLGA4xufR2uD1JyNeHAq6WqcXZ/Ud+a1nVR0GmTB8QQ8VyH1mQ9ATkEuq84j2KtGhJGUvpdP6jWOX5FtB3/SW0GgeR2tTpb5+VeOvMWB4ZJghkp5iqanXQa7dOWQri60t17SaA7xygNXQkevomCe2ZEsRYxMPp3DPwbMoTnWkSrAdCo4DVdz3uXW0+zr+Qox3jRx8M5MowblhGggeRuylTjoLgAGpZ5cc6TpahyDIsVkj3jc8TXu/NOQyfUw= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Mon, Jun 19, 2023, at 1:18 PM, Nadav Amit wrote: >> On Jun 19, 2023, at 10:09 AM, Andy Lutomirski wrote: >>=20 >> But jit_text_alloc() can't do this, because the order of operations d= oesn't match. With jit_text_alloc(), the executable mapping shows up be= fore the text is populated, so there is no atomic change from not-there = to populated-and-executable. Which means that there is an opportunity f= or CPUs, speculatively or otherwise, to start filling various caches wit= h intermediate states of the text, which means that various architecture= s (even x86!) may need serialization. >>=20 >> For eBPF- and module- like use cases, where JITting/code gen is quite= coarse-grained, perhaps something vaguely like: >>=20 >> jit_text_alloc() -> returns a handle and an executable virtual addres= s, but does *not* map it there >> jit_text_write() -> write to that handle >> jit_text_map() -> map it and synchronize if needed (no sync needed on= x86, I think) > > Andy, would you mind explaining why you think a sync is not needed? I=20 > mean I have a =E2=80=9Cfeeling=E2=80=9D that perhaps TSO can guarantee= something based=20 > on the order of write and page-table update. Is that the argument? Sorry, when I say "no sync" I mean no cross-CPU synchronization. I'm as= suming the underlying sequence of events is: allocate physical pages (jit_text_alloc) write to them (with MOV, memcpy, whatever), via the direct map or via a = temporary mm do an appropriate *local* barrier (which, on x86, is probably implied by= TSO, as the subsequent pagetable change is at least a release; also, an= y any previous temporary mm stuff would have done MOV CR3 afterwards, wh= ich is a full "serializing" barrier) optionally zap the direct map via IPI, assuming the pages are direct map= ped (but this could be avoided with a smart enough allocator and tempora= ry_mm above) install the final RX PTE (jit_text_map), which does a MOV or maybe a LOC= K CMPXCHG16B. Note that the virtual address in question was not readabl= e or executable before this, and all CPUs have serialized since the last= time it was executable. either jump to the new text locally, or: 1. Do a store-release to tell other CPUs that the text is mapped 2. Other CPU does a load-acquire to detect that the text is mapped and j= umps to the text This is all approximately the same thing that plain old mmap(..., PROT_E= XEC, ...) does. > > On this regard, one thing that I clearly do not understand is why=20 > *today* it is ok for users of bpf_arch_text_copy() not to call=20 > text_poke_sync(). Am I missing something? I cannot explain this, because I suspect the current code is wrong. But= it's only wrong across CPUs, because bpf_arch_text_copy goes through te= xt_poke_copy, which calls unuse_temporary_mm(), which is serializing. A= nd it's plausible that most eBPF use cases don't actually cause the load= ed program to get used on a different CPU without first serializing on t= he CPU that ends up using it. (Context switches and interrupts are seri= alizing.) FRED could make interrupts non-serializing. I sincerely hope that FRED d= oesn't cause this all to fall apart. --Andy