From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 77D69C77B7A for ; Wed, 17 May 2023 14:18:22 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B5658900005; Wed, 17 May 2023 10:18:21 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id B070C900003; Wed, 17 May 2023 10:18:21 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9CF5C900005; Wed, 17 May 2023 10:18:21 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 8FE63900003 for ; Wed, 17 May 2023 10:18:21 -0400 (EDT) Received: from smtpin18.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 58D8F8021D for ; Wed, 17 May 2023 14:18:21 +0000 (UTC) X-FDA: 80799952002.18.7AB086D Received: from out-61.mta0.migadu.com (out-61.mta0.migadu.com [91.218.175.61]) by imf27.hostedemail.com (Postfix) with ESMTP id 1916640009 for ; Wed, 17 May 2023 14:18:18 +0000 (UTC) Authentication-Results: imf27.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=DRkNRqw2; spf=pass (imf27.hostedemail.com: domain of kent.overstreet@linux.dev designates 91.218.175.61 as permitted sender) smtp.mailfrom=kent.overstreet@linux.dev; dmarc=pass (policy=none) header.from=linux.dev ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1684333099; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=yJ5xyIjdnqOODdgYyT/tGprQTvXceNzR+PcU18XJA2Y=; b=xx7kgeMkdDHucNsr1FYHJTXdd2qKkLzr6HKjER3K4cUkGE2tVaCcDeVoTraP9w/Wmfaxnj UDiZxHHg8wp/qKB0Z/XkhAWUMS0Xsb/IFie1se7xBrsrL6KUXRtzljNkE14B17viqWgWTf 3n79wmCjwV9YXJ+6XN2t/rmum/d6j5I= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1684333099; a=rsa-sha256; cv=none; b=Ch6UsjTzEtV81Eufex3gT7eaAAyawNSxIWVhC2BYe8n6/LVH52g/s5MyrVl1PWrYnF25t4 PXySUjDjRh/AaoiBuZ6XURHLT1tv6rt5TWIYWWOi+ABqJ1raMs4r4OkxKp6B0rEdNzPJYW Bayyd4SUHja9ztVXrWF9U4qPulZODzc= ARC-Authentication-Results: i=1; imf27.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=DRkNRqw2; spf=pass (imf27.hostedemail.com: domain of kent.overstreet@linux.dev designates 91.218.175.61 as permitted sender) smtp.mailfrom=kent.overstreet@linux.dev; dmarc=pass (policy=none) header.from=linux.dev Date: Wed, 17 May 2023 10:18:11 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1684333096; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=yJ5xyIjdnqOODdgYyT/tGprQTvXceNzR+PcU18XJA2Y=; b=DRkNRqw2cpxP72oh9ESo41gFutZSDl+YmWd79P8rLziplWE9cmhMThXeM5UNfH3kBgthIO a2S1CSe8ZEfb6r9FtPGIm+83nqHbWlyB9vWkyhrNJhAK04eoQo+AdJg4SsVr1UUtpeiZoX R3WCAJelUZMAB+z0U6eH56SZab8kKuQ= X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Kent Overstreet To: Mike Rapoport Cc: Matthew Wilcox , Kees Cook , Johannes Thumshirn , "linux-kernel@vger.kernel.org" , "linux-fsdevel@vger.kernel.org" , "linux-bcachefs@vger.kernel.org" , Kent Overstreet , Andrew Morton , Uladzislau Rezki , "hch@infradead.org" , "linux-mm@kvack.org" , "linux-hardening@vger.kernel.org" , song@kernel.org Subject: Re: [PATCH 07/32] mm: Bring back vmalloc_exec Message-ID: References: <20230509165657.1735798-1-kent.overstreet@linux.dev> <20230509165657.1735798-8-kent.overstreet@linux.dev> <3508afc0-6f03-a971-e716-999a7373951f@wdc.com> <202305111525.67001E5C4@keescook> <202305161401.F1E3ACFAC@keescook> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Migadu-Flow: FLOW_OUT X-Rspamd-Queue-Id: 1916640009 X-Rspam-User: X-Rspamd-Server: rspam06 X-Stat-Signature: xwgzx85zu478mxnjr7bsitqz7nctdoxm X-HE-Tag: 1684333098-943168 X-HE-Meta: U2FsdGVkX19eL8d0bkxPWodV6TSUFatL5JS5HSIu/lquZjzleuGXbJrGJJgvRyhhUPy/n2mGbs12Fj9TbAyPKk9HXWyOKuqpjR7I8d25NxZc6UG6oPx6uvPR1jlM+vJ1hSdHPrf8WmJY5iLxLwFVW0QNhJeDyMGxPEwXQMgkuMMOZVtemdmlLPnqiUnf2lUaj7z85LCEkzeDntCQ/7YxIPgPRHavwAs0gfIBlC04MR5U2zisRSGmkaeQrGIPLo1bChp115JeLdyijLg2xDnz6764o7umrNfAj8yR+g8JfDCsO06kBGdR0U1M3H5ClDBQT3gdBtUAB0vKKzl11lvRigXE/+M+ZX5ustnfi/hL3EbkRgDjlQc8euYOzgJy+mUQ0umhd9t73A2Xh+v+SCtIXhJ6bwVcq0JnEX2evzCUavYCJ1IBsPiJ/7cSIgvoXSMO1nOiFMNe9TL9JMgDKCwFjUkhQp4XWV3bUFzi4cW+Z8w65rnS2zTezaRmWSxQcPpaOcF9VF+Cf7/O1Lbg67sD3wKWL2Ftzbe6QCpFkr0vI/DL2U5uEvxJ0MR5cRmhm2GRi4exGFnBECHiHDqn2Iano4vkYEkn+dmmad8a28GsG1QV8aGuuTzJ3zsHbGeZwWAl3UvEgA7U26kJwECAbKkeU5nnIIEVLBS3eRSnYKkETykzACgXgs7iWlvFsnRu2B5K010JYMQgotrW5DlFDpKF9yHuIsf9VJyrhS8WgAZUE5RbhZc6g2saIiGCrDTzjnn2NUnHOxo/DOPouN49Ebt8PiJnwQlufanCOhk9oAe+bD8VEvsdHwRFq30ebUzB5dhzkmsCLw1H7I5uxrgZpHfUhPJE5XKiar8gsCgh+ZJPonfYq9CSUly+9/Ta7eHG8+bxq3wzy6/LOTQ6TK6KA4QKyMe95K8CxeqPPKxf7i2OSA5NGiHJCVTPl2nJvRV1fKERpKRXkjah/Ti1gocF6Ww m+qio6Lz GZEFRQWlGoeHul77BbUgMS8+CHvdCpAvLCo5eSnAPWsv68GhGXIr3GxrXkhQapqSBXqlS5HNCA+Aufkygzot4hpWztqCRb+6qeruvjw4BVb/Kx17qc9a4Qo2V/YjVj+YYdAqGdhE0gDIY+XkSHDEFqeeefZQwMmZmnsuGWum0ihRFTlnwYzebofHZy0E+zHl+oguSK4mlKntGzRuvLRQtsSzwjKAASmrgFp18JE99a5uvUq0pMOu/Ye+CTx+ql1K/OaTux64AKMgyD34= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed, May 17, 2023 at 05:04:27PM +0300, Mike Rapoport wrote: > On Wed, May 17, 2023 at 01:28:43AM -0400, Kent Overstreet wrote: > > On Tue, May 16, 2023 at 10:47:13PM +0100, Matthew Wilcox wrote: > > > On Tue, May 16, 2023 at 05:20:33PM -0400, Kent Overstreet wrote: > > > > On Tue, May 16, 2023 at 02:02:11PM -0700, Kees Cook wrote: > > > > > For something that small, why not use the text_poke API? > > > > > > > > This looks like it's meant for patching existing kernel text, which > > > > isn't what I want - I'm generating new functions on the fly, one per > > > > btree node. > > > > > > > > I'm working up a new allocator - a (very simple) slab allocator where > > > > you pass a buffer, and it gives you a copy of that buffer mapped > > > > executable, but not writeable. > > > > > > > > It looks like we'll be able to convert bpf, kprobes, and ftrace > > > > trampolines to it; it'll consolidate a fair amount of code (particularly > > > > in bpf), and they won't have to burn a full page per allocation anymore. > > > > > > > > bpf has a neat trick where it maps the same page in two different > > > > locations, one is the executable location and the other is the writeable > > > > location - I'm stealing that. > > > > > > How does that avoid the problem of being able to construct an arbitrary > > > gadget that somebody else will then execute? IOW, what bpf has done > > > seems like it's working around & undoing the security improvements. > > > > > > I suppose it's an improvement that only the executable address is > > > passed back to the caller, and not the writable address. > > > > Ok, here's what I came up with. Have not tested all corner cases, still > > need to write docs - but I think this gives us a nicer interface than > > what bpf/kprobes/etc. have been doing, and it does the sub-page sized > > allocations I need. > > > > With an additional tweak to module_alloc() (not done in this patch yet) > > we avoid ever mapping in pages both writeable and executable: > > > > -->-- > > > > From 6eeb6b8ef4271ea1a8d9cac7fbaeeb7704951976 Mon Sep 17 00:00:00 2001 > > From: Kent Overstreet > > Date: Wed, 17 May 2023 01:22:06 -0400 > > Subject: [PATCH] mm: jit/text allocator > > > > This provides a new, very simple slab allocator for jit/text, i.e. bpf, > > ftrace trampolines, or bcachefs unpack functions. > > > > With this API we can avoid ever mapping pages both writeable and > > executable (not implemented in this patch: need to tweak > > module_alloc()), and it also supports sub-page sized allocations. > > This looks like yet another workaround for that module_alloc() was not > designed to handle permission changes. Rather than create more and more > wrappers for module_alloc() we need to have core API for code allocation, > apparently on top of vmalloc, and then use that API for modules, bpf, > tracing and whatnot. > > There was quite lengthy discussion about how to handle code allocations > here: > > https://lore.kernel.org/linux-mm/20221107223921.3451913-1-song@kernel.org/ Thanks for the link! Added Song to the CC. Song, I'm looking at your code now - switching to hugepages is great, but I wonder if we might be able to combine our two approaches - with the slab allocator I did, do we have to bother with VMAs at all? And then it gets us sub-page sized allocations. > and Song is already working on improvements for module_alloc(), e.g. see > commit ac3b43283923 ("module: replace module_layout with module_memory") > > Another thing, the code below will not even compile on !x86. Due to text_poke(), which I see is abstracted better in that patchset. I'm very curious why text_poke() does tlb flushing at all; it seems like flush_icache_range() is actually what's needed? text_poke() also only touching up to two pages, without that being documented, is also a footgun... And I'm really curious why text_poke() is needed at all. Seems like we could just use kmap_local() to create a temporary writeable mapping, except in my testing that got me a RO mapping. Odd.