From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D44DFC4332F for ; Wed, 9 Nov 2022 11:19:11 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 5DE0D6B0072; Wed, 9 Nov 2022 06:19:11 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 58E1A6B0073; Wed, 9 Nov 2022 06:19:11 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 455F06B0074; Wed, 9 Nov 2022 06:19:11 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 31DB66B0072 for ; Wed, 9 Nov 2022 06:19:11 -0500 (EST) Received: from smtpin11.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id E9C9A1A1547 for ; Wed, 9 Nov 2022 11:19:10 +0000 (UTC) X-FDA: 80113657260.11.1DAB230 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf17.hostedemail.com (Postfix) with ESMTP id 66AC64000E for ; Wed, 9 Nov 2022 11:19:05 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 5AC8E61A09; Wed, 9 Nov 2022 11:18:02 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id DA778C433D6; Wed, 9 Nov 2022 11:17:57 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1667992681; bh=X0mUcunS2+WQwAFy3NAIChidBa7zo166VTsPNnz+xsY=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=XJb2r5iVIrV8Dgi2i4J11MfRwRuKuSGv/9FFWtd3xjkV0hGKzNVn2gGWyLyn/C+Gn X1LsrY4R7mBG65r9bpmpoJifneTDolrcJ+GqgFvjFdsrNUHNdBbX9kLon4zrp7FQlV 4zG5idn3X0w2/A/RWRhFODKXQSRa24pTDQrUcpDv4aBqxTsGaFCufDBL+Ppp3Fo9ED 6om0GJhQYY20+PiWiC+77xbbzzOUp6XHKiI24+fYqOZiqCSTayZ+XI5GDp+uyj4NM9 aX2LZkld1hQ2Lz3/ZEeuPJs6E5CrbES121DJg0ujixNgcnX2KqLpRgzcRUaw4hzC/0 hIU3iHEhnWgYg== Date: Wed, 9 Nov 2022 13:17:46 +0200 From: Mike Rapoport To: "Edgecombe, Rick P" Cc: "song@kernel.org" , "peterz@infradead.org" , "bpf@vger.kernel.org" , "linux-mm@kvack.org" , "hch@lst.de" , "x86@kernel.org" , "akpm@linux-foundation.org" , "mcgrof@kernel.org" , "Lu, Aaron" Subject: Re: [PATCH bpf-next v2 0/5] execmem_alloc for BPF programs Message-ID: References: <20221107223921.3451913-1-song@kernel.org> <9e59a4e8b6f071cf380b9843cdf1e9160f798255.camel@intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <9e59a4e8b6f071cf380b9843cdf1e9160f798255.camel@intel.com> ARC-Authentication-Results: i=1; imf17.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=XJb2r5iV; spf=pass (imf17.hostedemail.com: domain of rppt@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=rppt@kernel.org; dmarc=pass (policy=none) header.from=kernel.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1667992749; a=rsa-sha256; cv=none; b=2tlR8U3j+cE2rxwX8NQ8rxnbhRA7SPs+C/58uDY2weE28iYHKOcnynaLbXTEZmM854oi77 LSi5t2ru5N2SWqHMfkFG58aiMgLnk3tRy77ItPZjM6NHjGxXghwiSxtzF0F91tjR9wfxmo rgpmH6t2ff+tsfJPAyZwrRwgsNDAxH8= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1667992749; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=7+MKh6ix99QmZCpfyfeCfFeVJpbYnCFm6nsnPQC6HFI=; b=Vfws/nN4Xt282VfsFucDabQg9xEbXfE+e3o70nEPX/wU7SV/evKlLJtmhHASO0O8WEdRu1 DUq1wW8TCbFTphcflO4M5JK9XApCrjSmYKAzaKnTKe1gplmduPHBWNnRwyAiWxDXZ020nz RPou9yxkFYr/fxKRw5CP1fN22joyxdM= X-Stat-Signature: phoidi7usnjw9q1rb567fqe55aciafow X-Rspamd-Queue-Id: 66AC64000E Authentication-Results: imf17.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=XJb2r5iV; spf=pass (imf17.hostedemail.com: domain of rppt@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=rppt@kernel.org; dmarc=pass (policy=none) header.from=kernel.org X-Rspam-User: X-Rspamd-Server: rspam06 X-HE-Tag: 1667992745-398963 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, Nov 08, 2022 at 04:51:12PM +0000, Edgecombe, Rick P wrote: > On Tue, 2022-11-08 at 13:27 +0200, Mike Rapoport wrote: > > > Based on our experiments [5], we measured 0.5% performance > > > improvement > > > from bpf_prog_pack. This patchset further boosts the improvement to > > > 0.7%. > > > The difference is because bpf_prog_pack uses 512x 4kB pages instead > > > of > > > 1x 2MB page, bpf_prog_pack as-is doesn't resolve #2 above. > > > > > > This patchset replaces bpf_prog_pack with a better API and makes it > > > available for other dynamic kernel text, such as modules, ftrace, > > > kprobe. > > > > > > The proposed execmem_alloc() looks to me very much tailored for x86 > > to be > > used as a replacement for module_alloc(). Some architectures have > > module_alloc() that is quite different from the default or x86 > > version, so > > I'd expect at least some explanation how modules etc can use execmem_ > > APIs > > without breaking !x86 architectures. > > I think this is fair, but I think we should ask ask ourselves - how > much should we do in one step? I think that at least we need an evidence that execmem_alloc() etc can be actually used by modules/ftrace/kprobes. Luis said that RFC v2 didn't work for him at all, so having a core MM API for code allocation that only works with BPF on x86 seems not right to me. > For non-text_poke() architectures, the way you can make it work is have > the API look like: > execmem_alloc() <- Does the allocation, but necessarily usable yet > execmem_write() <- Loads the mapping, doesn't work after finish() > execmem_finish() <- Makes the mapping live (loaded, executable, ready) > > So for text_poke(): > execmem_alloc() <- reserves the mapping > execmem_write() <- text_pokes() to the mapping > execmem_finish() <- does nothing > > And non-text_poke(): > execmem_alloc() <- Allocates a regular RW vmalloc allocation > execmem_write() <- Writes normally to it > execmem_finish() <- does set_memory_ro()/set_memory_x() on it > > Non-text_poke() only gets the benefits of centralized logic, but the > interface works for both. This is pretty much what the perm_alloc() RFC > did to make it work with other arch's and modules. But to fit with the > existing modules code (which is actually spread all over) and also > handle RO sections, it also needed some additional bells and whistles. I'm less concerned about non-text_poke() part, but rather about restrictions where code and data can live on different architectures and whether these restrictions won't lead to inability to use the centralized logic on, say, arm64 and powerpc. For instance, if we use execmem_alloc() for modules, it means that data sections should be allocated separately with plain vmalloc(). Will this work universally? Or this will require special care with additional complexity in the modules code? > So the question I'm trying to ask is, how much should we target for the > next step? I first thought that this functionality was so intertwined, > it would be too hard to do iteratively. So if we want to try > iteratively, I'm ok if it doesn't solve everything. With execmem_alloc() as the first step I'm failing to see the large picture. If we want to use it for modules, how will we allocate RO data? with similar rodata_alloc() that uses yet another tree in vmalloc? How the caching of large pages in vmalloc can be made useful for use cases like secretmem and PKS? -- Sincerely yours, Mike.