From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail.zytor.com (terminus.zytor.com [198.137.202.136]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 715F6202C39; Sat, 22 Feb 2025 20:54:59 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.137.202.136 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740257701; cv=none; b=Ka6R5hW0UR0NMj5U5KJzgJTDQs1BLtxYBRNMlDruLIXEUIeuGG9fFWrdoFHv6lBFxPfOrIPjMkI/lJoh3u1o3GvnMQeuco9PGenqVKfypRcDyhxgAL+v4dk9Apd4fDpjTRGIzD25PcFMlWMpP9pHqT3L7jvd1J/ssDtOSIRxw5w= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740257701; c=relaxed/simple; bh=iyj/RWNW0z1QisL95Uk/XOY7VdhZ8WzFbkNS2sGCmtM=; h=Date:From:To:CC:Subject:In-Reply-To:References:Message-ID: MIME-Version:Content-Type; b=Mlk4dIpltk95rQpQ7mXDUeNhPRVLaXO8VCCB5lNRZV3kfJXAJSq9ARbtd98k+llNSaBMVu3iP017QzhbAGdaIJeEY+4xc0GFO2o/77ZrJYehch2q/A5/kMARz09A1o93j9SzHYM7mVcH8f3xMPtUMZ/H2y5Z+FdXZ6TGNZ8yozI= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=zytor.com; spf=pass smtp.mailfrom=zytor.com; dkim=pass (2048-bit key) header.d=zytor.com header.i=@zytor.com header.b=hzvN6XoC; arc=none smtp.client-ip=198.137.202.136 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=zytor.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=zytor.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=zytor.com header.i=@zytor.com header.b="hzvN6XoC" Received: from [127.0.0.1] ([76.133.66.138]) (authenticated bits=0) by mail.zytor.com (8.18.1/8.17.1) with ESMTPSA id 51MKsVvb3810618 (version=TLSv1.3 cipher=TLS_AES_128_GCM_SHA256 bits=128 verify=NO); Sat, 22 Feb 2025 12:54:31 -0800 DKIM-Filter: OpenDKIM Filter v2.11.0 mail.zytor.com 51MKsVvb3810618 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=zytor.com; s=2025021701; t=1740257672; bh=iyj/RWNW0z1QisL95Uk/XOY7VdhZ8WzFbkNS2sGCmtM=; h=Date:From:To:CC:Subject:In-Reply-To:References:From; b=hzvN6XoC8+bbRgrhLeh8EDDTRyv+650KdHpz+DbBFhXVDD26AMcu3IsRg4CTNtMON gZbSLfkbY+vZ6DqGnKF8ssHNV24zR0ys/DGxj0ekSE1TdJaIKeAHwqKx3a+IGpKYKt B5o+x+SHHKu+igr8Ny8Es7vH6O6AhvFQlXzwNZB1gmf36ToTa6C63QXF+4CNry2J3J swu+zKzy/sdCWYYJTeInIJ6yZsnJRj+4xXUyZtMgz6ji5+GzttFSwGA/ZaUFkR2lMG X7dqOP8uPZlb3UvKm0tk/9x4NJChFCZ4zhMkwum9SR9zKsN5/Ez4RywnFMieGpA8ty c/It6vQwxzGVg== Date: Sat, 22 Feb 2025 12:54:31 -0800 From: "H. Peter Anvin" To: Kent Overstreet , Linus Torvalds CC: Ventura Jack , Gary Guo , airlied@gmail.com, boqun.feng@gmail.com, david.laight.linux@gmail.com, ej@inai.de, gregkh@linuxfoundation.org, hch@infradead.org, ksummit@lists.linux.dev, linux-kernel@vger.kernel.org, miguel.ojeda.sandonis@gmail.com, rust-for-linux@vger.kernel.org Subject: Re: C aggregate passing (Rust kernel policy) User-Agent: K-9 Mail for Android In-Reply-To: References: <20250222141521.1fe24871@eugeo> <6pwjvkejyw2wjxobu6ffeyolkk2fppuuvyrzqpigchqzhclnhm@v5zhfpmirk2c> Message-ID: <6EFFB41B-9145-496E-8217-07AF404BE695@zytor.com> Precedence: bulk X-Mailing-List: ksummit@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable On February 22, 2025 12:00:04 PM PST, Kent Overstreet wrote: >On Sat, Feb 22, 2025 at 11:18:33AM -0800, Linus Torvalds wrote: >> On Sat, 22 Feb 2025 at 10:54, Kent Overstreet wrote: >> > >> > If that work is successful it could lead to significant improvements = in >> > code generation, since aliasing causes a lot of unnecessary spills an= d >> > reloads - VLIW could finally become practical=2E >>=20 >> No=2E >>=20 >> Compiler people think aliasing matters=2E It very seldom does=2E And VL= IW >> will never become practical for entirely unrelated reasons (read: OoO >> is fundamentally superior to VLIW in general purpose computing)=2E > >OoO and VLIW are orthogonal, not exclusive, and we always want to go >wider, if we can=2E Separately, neverending gift that is Spectre should b= e >making everyone reconsider how reliant we've become on OoO=2E > >We'll never get rid of OoO, I agree on that point=2E But I think it's >worth some thought experiments about how many branches actually need to >be there vs=2E how many are there because everyone's assumed "branches ar= e >cheap! (so it's totally fine if the CPU sucks at the alternatives)" on >both the hardware and software side=2E > >e=2Eg=2E cmov historically sucked (and may still, I don't know), but a _l= ot_ >of branches should just be dumb ALU ops=2E I wince at a lot of the >assembly I see gcc generate for e=2Eg=2E short multiword integer >comparisons, there are a ton of places where it'll emit 3 or 5 branches >where 1 is all you need if we had better ALU primitives=2E > >> Aliasing is one of those bug-bears where compiler people can make >> trivial code optimizations that look really impressive=2E So compiler >> people *love* having simplistic aliasing rules that don't require real >> analysis, because the real analysis is hard (not just expensive, but >> basically unsolvable)=2E > >I don't think crazy compiler experiments from crazy C people have much >relevance, here=2E I'm talking about if/when Rust is able to get this >right=2E > >> The C standards body has been much too eager to embrace "undefined beha= vior"=2E > >Agree on C, but for the rest I think you're just failing to imagine what >we could have if everything wasn't tied to a language with >broken/missing semantics w=2Er=2Et=2E aliasing=2E > >Yes, C will never get a memory model that gets rid of the spills and >reloads=2E But Rust just might=2E It's got the right model at the referen= ce >level, we just need to see if they can push that down to raw pointers in >unsafe code=2E > >But consider what the world would look like if Rust fixes aliasing and >we get a microarchitecture that's able to take advantage of it=2E Do a >microarchitecture that focuses some on ALU ops to get rid of as many >branches as possible (e=2Eg=2E min/max, all your range checks that don't >trap), get rid of loads and spills from aliasing so you're primarily >running out of registers - and now you _do_ have enough instructions in >a basic block, with fixed latency, that you can schedule at compile time >to make VLIW worth it=2E > >I don't think it's that big of a leap=2E Lack of cooperation between >hardware and compiler folks (and the fact that what the hardware people >wanted was impossible at the time) was what killed Itanium, so if you >fix those two things=2E=2E=2E > >> The kernel basically turns all that off, as much as possible=2E Overflo= w >> isn't undefined in the kernel=2E Aliasing isn't undefined in the kernel= =2E >> Things like that=2E > >Yeah, the religion of undefined behaviour in C has been an absolute >nightmare=2E > >It's not just the compiler folks though, that way of thinking has >infected entirely too many people people in kernel and userspace - >"performance is the holy grail and all that matters and thou shalt shave >every single damn instruction"=2E > >Where this really comes up for me is assertions, because we're not >giving great guidance there=2E It's always better to hit an assertion tha= n >walk off into undefined behaviour la la land, but people see "thou shalt >not crash the kernel" as a reason not to use BUG_ON() when it _should_ >just mean "always handle the error if you can't prove that it can't >happen"=2E > >> When 'integer overflow' means that you can _sometimes_ remove one >> single ALU operation in *some* loops, but the cost of it is that you >> potentially introduced some seriously subtle security bugs, I think we >> know it was the wrong thing to do=2E > >And those branches just _do not matter_ in practice, since if one side >leads to a trap they're perfectly predicted and to a first approximation >we're always bottlenecked on memory=2E > VLIW and OoO might seem orthogonal, but they aren't =E2=80=93 because they= are trying to solve the same problem, combining them either means the OoO = engine can't do a very good job because of false dependencies (if you are s= cheduling molecules) or you have to break them instructions down into atoms= , at which point it is just a (often quite inefficient) RISC encoding=2E In= short, VLIW *might* make sense when you are statically scheduling a known = pipeline, but it is basically a dead end for evolution =E2=80=93 so unless = you can JIT your code for each new chip generation=2E=2E=2E But OoO still is more powerful, because it can do *dynamic* scheduling=2E = A cache miss doesn't necessarily mean that you have to stop the entire mach= ine, for example=2E