From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2B43DEB64D8 for ; Tue, 20 Jun 2023 22:33:12 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 73F618D0002; Tue, 20 Jun 2023 18:33:11 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 6C7808D0001; Tue, 20 Jun 2023 18:33:11 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 541C58D0002; Tue, 20 Jun 2023 18:33:11 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 4137B8D0001 for ; Tue, 20 Jun 2023 18:33:11 -0400 (EDT) Received: from smtpin18.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id F1D7CA05EE for ; Tue, 20 Jun 2023 22:33:10 +0000 (UTC) X-FDA: 80924578140.18.B7F1935 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf15.hostedemail.com (Postfix) with ESMTP id AFB64A0006 for ; Tue, 20 Jun 2023 22:33:08 +0000 (UTC) Authentication-Results: imf15.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=lKDEk2Y+; dmarc=pass (policy=none) header.from=kernel.org; spf=pass (imf15.hostedemail.com: domain of luto@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=luto@kernel.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1687300388; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=jVeBO4flahBbnjhX4Tp3tQxxENFJCdf3PttGZSppfJ4=; b=wXCG1cg5li/EZwhjbnJBWIDG136e9PL3Bly1myCxbkrWGYKHqKHrh7dm5hBgsqJtpyyNlJ BfvmTenpzBMXgD74lN9w1786V6R2erCvVf3bnAd3JhFo3HB3GNB9v1Je5hmwiP+JO3oWZY iBA7ur/XgsqN4M8gxnC5FihsXAp3PW8= ARC-Authentication-Results: i=1; imf15.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=lKDEk2Y+; dmarc=pass (policy=none) header.from=kernel.org; spf=pass (imf15.hostedemail.com: domain of luto@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=luto@kernel.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1687300388; a=rsa-sha256; cv=none; b=0/tssB8nR9KEITWR/RNYHBFbSS3zKQVSEspIMk1rXdTIVRIoLffQ++pVwALd4z9+7oBpk/ IVc7wAKzvDAj7rDkVbDAzytt35g7OGbzlLm9ukN/ziShQwWAnK0LLOiZ67loeSueAkXMoQ FKHhcDcyTwgMT2qyDPIYp9wTbsr1jfQ= Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id A0C7861336; Tue, 20 Jun 2023 22:33:07 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 71269C433C9; Tue, 20 Jun 2023 22:33:06 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1687300387; bh=mEiuO2xX0IcJgMZ94z/fc+Jrradl7/CxlLfkUtw2y1E=; h=In-Reply-To:References:Date:From:To:Cc:Subject:From; b=lKDEk2Y+c6w76/rnnZJxjBPNEbQ2s66cBUg8vwf3lcapHCD4geBstpPVu2Ljj2Ktc diiQeqKDHfbFrWDcYcyRvbyaYBdgRJkChQasG68qdkWRQl5uLMSfEOIcvjDbimCtDm nXsGOf2k4J7eV+PysjtjLIKgH535Zls2ZkpLgOS+ahbRMNnURmgzzOxDujX5IdxrHA u/ScYm49YmGkR+N4LVrZLJPURWbgKyTWSP7GJppffBR05q8OgvrTa8W8gzlhdTwrQh sTXBeDHIzZMfGF7moyO+2cjf+gaQ8hL/VFnnC6eI19e9AHAV0XCFCKEn5naYSXXsqy sGXpmmjrFYusA== Received: from compute3.internal (compute3.nyi.internal [10.202.2.43]) by mailauth.nyi.internal (Postfix) with ESMTP id 564CF27C005B; Tue, 20 Jun 2023 18:33:05 -0400 (EDT) Received: from imap48 ([10.202.2.98]) by compute3.internal (MEProxy); Tue, 20 Jun 2023 18:33:05 -0400 X-ME-Sender: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvhedrgeefiedguddvucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmne cujfgurhepofgfggfkjghffffhvfevufgtsehttdertderredtnecuhfhrohhmpedftehn ugihucfnuhhtohhmihhrshhkihdfuceolhhuthhosehkvghrnhgvlhdrohhrgheqnecugg ftrfgrthhtvghrnhepvdevledvueektdeuhfegvedvleeugfetgefggffggeethefhkedt ffekieffteejnecuffhomhgrihhnpehgohgusgholhhtrdhorhhgnecuvehluhhsthgvrh fuihiivgeptdenucfrrghrrghmpehmrghilhhfrhhomheprghnugihodhmvghsmhhtphgr uhhthhhpvghrshhonhgrlhhithihqdduudeiudekheeifedvqddvieefudeiiedtkedqlh huthhopeepkhgvrhhnvghlrdhorhhgsehlihhnuhigrdhluhhtohdruhhs X-ME-Proxy: Feedback-ID: ieff94742:Fastmail Received: by mailuser.nyi.internal (Postfix, from userid 501) id AAE1A31A0063; Tue, 20 Jun 2023 18:33:04 -0400 (EDT) X-Mailer: MessagingEngine.com Webmail Interface User-Agent: Cyrus-JMAP/3.9.0-alpha0-499-gf27bbf33e2-fm-20230619.001-gf27bbf33 Mime-Version: 1.0 Message-Id: In-Reply-To: References: <20230509165657.1735798-1-kent.overstreet@linux.dev> <20230509165657.1735798-8-kent.overstreet@linux.dev> <20230619104717.3jvy77y3quou46u3@moria.home.lan> <20230619191740.2qmlza3inwycljih@moria.home.lan> <5ef2246b-9fe5-4206-acf0-0ce1f4469e6c@app.fastmail.com> <20230620180839.oodfav5cz234pph7@moria.home.lan> <37d2378e-72de-e474-5e25-656b691384ba@intel.com> Date: Tue, 20 Jun 2023 15:32:44 -0700 From: "Andy Lutomirski" To: "Dave Hansen" , "Kent Overstreet" Cc: "Mark Rutland" , "Linux Kernel Mailing List" , linux-fsdevel@vger.kernel.org, "linux-bcachefs@vger.kernel.org" , "Kent Overstreet" , "Andrew Morton" , "Uladzislau Rezki" , "hch@infradead.org" , linux-mm@kvack.org, "Kees Cook" , "the arch/x86 maintainers" Subject: Re: [PATCH 07/32] mm: Bring back vmalloc_exec Content-Type: text/plain X-Rspam-User: X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: AFB64A0006 X-Stat-Signature: jmo1ks664tn9bso8e4xbwcns6711k65b X-HE-Tag: 1687300388-887011 X-HE-Meta: U2FsdGVkX1/GSVXzrsGmqk/arsg2m6beNWXZ1bo8p3iHT8LgmCoVRXFBy0QosYz5R1JdlEDovmUCw24D9uICYOOaGO7O7qHvRdmtfXTiP0+Wt8VcpgpYNNAdZt8VZzciihX7GZGZXwR0W23C1fpPJxyY/2aoEu3LkKNMIdNjNGIGtw6I47OEf7FPW6OCNMy48LgvyPVK4NGToPGCsFTsZJ0Y2SNHbFWFaQyWQyxi0Wd/TDaQOv9fb55wZ5XW/qf4ef7GJIqeQ8Ak33icoUYrGsTPy6ERhvRprhfvi+RTKu3G2zAv4R/Gl5DTOpNlWmTW6UcbbRlJVR4JlqJECyy531J8x553lrYaTE+K1vYyV4OYVvlvjrlnL2wp1WjY+9twj51qGz0KoUkrNCQw1E1zyhhs+OzCv+kk07WEyidV0MB9ybG3erVZ3qqLmfkHhFUHdc39bJJ1ASducjYHVsEHAzwpR2b09Eqyy//xWYh8+yGNG6XNlpgXsalOXa2xit4m+PK7dcBqQyznragkiwgKpjDrMZP6Zef7pvFF9gvr+/EZ6nAt6cme7K0u0F9Z+Ycq2BY/5JukmicUgtDIoa2J8TT14WYvg17WHBUAkdYSl98V8crPAUUkFuA9An6qXDBzbzd2552C7zCVzKDymYYoIatXHXaKDV4K67FG2UnKNAkNJwTq0tkya4GPNTDaHwENvyf+IlEHbie4da8c3JTpyeWF5xSyV4Cn8h2Nxb6/TvJe4yee5mLk9UqCxaDE0nf+wkwlKLwZoASLTlAA9oRKMmsWtGIUizkHwMY8y5nrXl5ON4nS6kjrRWUZQgCxZd7cxjsP/qbNrnk53LHo3qTHoa5AR9/X+FUHjIaevqDjwVpSqCBJcS3qmaTZMBtN+Usw9v9QIIQsqGXAjQXnDXDPUWEtNmwstw08l7tU1iO+QenpJTib+dDW22Zr2TewiGFUdZPnMdXXjqFAoGk1+tb qPaNoD7V 5/oTOEe6QoumRyKsqry2ChIv1o3dimXYzUUpcafp8o6uUW+B5DNAerEnNdqh/Zx0Q1WVr24bmTNbk47+760K2xd2Ry5LPi8slG1Y3mG4cPw/B0MVz+/jNe8bIm+ATW67fqkmrUQ7+OJxbZeaNTQMViTkz37vr+UHeIGLmq/KIwuwrrBK8pVlod4bZ4asPEwiizyoghBdip5qP4dqBHHsyIXnBASUYADxpcQueSoA9qidhkWdLjShPFXLFog0CFhdTkB6sk8uGyO90rTrejEnFFaYfGexK4mdcWaTz3zMSnuQ0E5kRHZY1NohHBAVBSXCVg46U55jnK7SC1gG2tdFrE3Zgh93PESEIUDhhyHgEb46T8LvtHVPniWDLC7eO0U7LnG4xBxjQ3uvIMIvuMu1pekgCPrRL3ce5d2eL1ndobq65ybmTSaN+a0voD3K0cehkcDd8Bf+BNmWPRLqWBvjtgywyC4/b9db8atf48p+iSwTrtIroba86yEf80/pZkiUlcKpq X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, Jun 20, 2023, at 1:42 PM, Andy Lutomirski wrote: > Hi all- > > On Tue, Jun 20, 2023, at 11:48 AM, Dave Hansen wrote: >>>> No, I'm saying your concerns are baseless and too vague to >>>> address. >>> If you don't address them, the NAK will stand forever, or at least >>> until a different group of people take over x86 maintainership. >>> That's fine with me. >> >> I've got a specific concern: I don't see vmalloc_exec() used in this >> series anywhere. I also don't see any of the actual assembly that's >> being generated, or the glue code that's calling into the generated >> assembly. >> >> I grepped around a bit in your git trees, but I also couldn't find it in >> there. Any chance you could help a guy out and point us to some of the >> specifics of this new, tiny JIT? >> > > So I had a nice discussion with Kent on IRC, and, for the benefit of > everyone else reading along, I *think* the JITted code can be replaced > by a table-driven approach like this: > > typedef unsigned int u32; > typedef unsigned long u64; > > struct uncompressed > { > u32 a; > u32 b; > u64 c; > u64 d; > u64 e; > u64 f; > }; > > struct bitblock > { > u64 source; > u64 target; > u64 mask; > int shift; > }; > > // out needs to be zeroed first > void unpack(struct uncompressed *out, const u64 *in, const struct > bitblock *blocks, int nblocks) > { > u64 *out_as_words = (u64*)out; > for (int i = 0; i < nblocks; i++) { > const struct bitblock *b; > out_as_words[b->target] |= (in[b->source] & b->mask) << > b->shift; > } > } > > void apply_offsets(struct uncompressed *out, const struct uncompressed *offsets) > { > out->a += offsets->a; > out->b += offsets->b; > out->c += offsets->c; > out->d += offsets->d; > out->e += offsets->e; > out->f += offsets->f; > } > > Which generates nice code: https://godbolt.org/z/3fEq37hf5 Thinking about this a bit more, I think the only real performance issue with my code is that it does 12 read-xor-write operations in memory, which all depend on each other in horrible ways. If it's reversed so the stores are all in order, then this issue would go away. typedef unsigned int u32; typedef unsigned long u64; struct uncompressed { u32 a; u32 b; u64 c; u64 d; u64 e; u64 f; }; struct field_piece { int source; int shift; u64 mask; }; struct field_pieces { struct field_piece pieces[2]; u64 offset; }; u64 unpack_one(const u64 *in, const struct field_pieces *pieces) { const struct field_piece *p = pieces->pieces; return (((in[p[0].source] & p[0].mask) << p[0].shift) | ((in[p[1].source] & p[1].mask) << p[1].shift)) + pieces->offset; } struct encoding { struct field_pieces a, b, c, d, e, f; }; void unpack(struct uncompressed *out, const u64 *in, const struct encoding *encoding) { out->a = unpack_one(in, &encoding->a); out->b = unpack_one(in, &encoding->b); out->c = unpack_one(in, &encoding->c); out->d = unpack_one(in, &encoding->d); out->e = unpack_one(in, &encoding->e); out->f = unpack_one(in, &encoding->f); } https://godbolt.org/z/srsfcGK4j Could be faster. Probably worth testing.