From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-lj1-f171.google.com (mail-lj1-f171.google.com [209.85.208.171]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id DFAEC1DF73C for ; Thu, 27 Feb 2025 17:34:06 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.208.171 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740677649; cv=none; b=D3PR9aq0+dOTxgNFauOb2d0fjeboy6oV6S3r6YGFXEegbvL99p8kigtGQt355VQynBo5YoBnfw9L83kybjiEe9yxhNZWlJV9P7ozBZb9wTP+6vHq5MjjgBjoBkH4CokFNvPNt89cZ1Pcowa/8vbX0aihO8RlA9SLFse5KjGkQLM= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740677649; c=relaxed/simple; bh=5io/u43OHyQ2ai/jTUAdQxiaUovK4D7VNZCRIf2uOfU=; h=MIME-Version:References:In-Reply-To:From:Date:Message-ID:Subject: To:Cc:Content-Type; b=K10XfSx/J8LmqtB0yAbHEgWeBdu0Y7D+CT3rBxwoqWzi1j6JGcD8Ju43Vo8KmVYevFLINxBvFhRtK27iByRuHzwkT4+ZftbWkHgjdSsPCw1E4NQARnpaqA4E49uq3nXNXnoHHJfkCiUOIXy9/OnA5wLwm+d1OC7knEKt7qH3EWw= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=BetAO8rU; arc=none smtp.client-ip=209.85.208.171 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="BetAO8rU" Received: by mail-lj1-f171.google.com with SMTP id 38308e7fff4ca-30918c29da2so13782761fa.0 for ; Thu, 27 Feb 2025 09:34:06 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1740677645; x=1741282445; darn=lists.linux.dev; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=zV0HMykXZypVpGQkkQGaLeN+qp7pfqOToj9ZBxVcMKY=; b=BetAO8rUVauxKZCi8jVP/aoHvwVcT4XmUDdklEHq9SV20F8gbd8e5cprWC4b+xEEoB Cb2OHLxD2bcdWxyAXszN3XnkfIg+p+6xAXI0R/w/Hn0uCnoKgeki0Ac0zHVz2UTzd5m3 xQyWL2ZtnQhbdJbbsHe+OQJQiq8AOJU0ohhEepM0Gden6Tk3W7KpEGO1CB8csmxeei6T /w/o/tCgr2YeuKapiKNRp22MJxGG/u7nPDeG/0xjTl9wAAd4Vf/P138lIjRwY3OZk03Q 1paH3HsZWcWLp9TTzTY+hGYi16nDtPVaqUUH4ZKmqnM/Dbvein2MZwL4QWjfrGphfkwS Kc0w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1740677645; x=1741282445; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=zV0HMykXZypVpGQkkQGaLeN+qp7pfqOToj9ZBxVcMKY=; b=eQbnk7/z/jdzOJYaZexmzRmxizyG91Df9nDi7a4GwVdIM8ZfW3nNhw5oSwYjbe5huB N/6cFEZLd8jfeZDYtkGitk6SxJ3GLe3gfjAR199FzDqs6L2ZrbGl866wPdbqR1FghbzP nCw7raaQKKjxrNXXkiTWZtTI6eMDWQ3I8IF4fDZ+7OgawSbFehbM8Uq0Px71K/KHVOqH 7vg1yNSLr87WEMfnRhLYOgOJ7N8nJkNcQdVlkJs7YdlE1Humx1g+f1l8uA0gZZOAFqA7 aiKqsU/mfKzUYJFkXVPAM9Xr4v8vxrRRZ10Sl9Q3tGkC4KDkYWeqQaRLdvigxFYSWoGd u81w== X-Forwarded-Encrypted: i=1; AJvYcCUgbfyqX1uKZRHePiHc+5UIp+egtZfjP2OT+DajnIC8hBkGup/kIqufjn2M8IOAPH7Q9xXUnyBn@lists.linux.dev X-Gm-Message-State: AOJu0YxGqYW/pYW5DoCIhbD/GKTeOvFECsQlIlAv0urcB/5319VnkWWM IT54bWne79xoxUtlf8tSGZfHkv528038LXtCoo6tBomQtGa/uH/lR8mz0sI7M8fHz5KcfCPYxWE hA4oGGUbfBdXXUHdQDtozymgd4jM= X-Gm-Gg: ASbGncv7eXqUkZUFrwCfhJ0iFlP7FeWB8iAiTPOWqcmV86eJuwLu/zus9Ori2gRZE9N 3C3gz+S6I5OHgEQYY4FqXV3JvVjmSS6SVjqXf7aXlPo+GF1WroLOffSMErsWqA6Mz32DruUz/DE D80mN84BTT X-Google-Smtp-Source: AGHT+IE2BH/1NSXRzDXlElgcsEfByXyUcbKRKGR78hEZv7EvEhyk5TgAq8IxZXZq0wHhsseeHK9uPd1WEEbhefpqOR4= X-Received: by 2002:a2e:3c0b:0:b0:308:ffa1:890b with SMTP id 38308e7fff4ca-30a80be24bdmr57978641fa.2.1740677644588; Thu, 27 Feb 2025 09:34:04 -0800 (PST) Precedence: bulk X-Mailing-List: ksummit@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 References: <20250222141521.1fe24871@eugeo> <6pwjvkejyw2wjxobu6ffeyolkk2fppuuvyrzqpigchqzhclnhm@v5zhfpmirk2c> <780ff858-4f8e-424f-b40c-b9634407dce3@ralfj.de> <7edf8624-c9a0-4d8d-a09e-2eac55dc6fc5@ralfj.de> In-Reply-To: <7edf8624-c9a0-4d8d-a09e-2eac55dc6fc5@ralfj.de> From: Ventura Jack Date: Thu, 27 Feb 2025 10:33:51 -0700 X-Gm-Features: AQ5f1JqN3AWgV8meR08A05wz9Nzi0d-mCixv49weqXaO89Jr7-WdZ80Dn-oXMRY Message-ID: Subject: Re: C aggregate passing (Rust kernel policy) To: Ralf Jung Cc: Kent Overstreet , Miguel Ojeda , Gary Guo , torvalds@linux-foundation.org, airlied@gmail.com, boqun.feng@gmail.com, david.laight.linux@gmail.com, ej@inai.de, gregkh@linuxfoundation.org, hch@infradead.org, hpa@zytor.com, ksummit@lists.linux.dev, linux-kernel@vger.kernel.org, rust-for-linux@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Wed, Feb 26, 2025 at 3:28=E2=80=AFPM Ralf Jung wrote: > > Hi all, > > On 26.02.25 19:09, Ventura Jack wrote: > > Is Miri the only one of its kind in the programming world? > > There are not many system languages in mass use, and > > those are the languages that first and foremost deal > > with undefined behavior. That would make Miri extra impressive. > > I am not aware of a comparable tool that would be in wide-spread use, or = that is > carefully aligned with the semantics of an actual compiler. > For C, there is Cerberus (https://www.cl.cam.ac.uk/~pes20/cerberus/) as a= n > executable version of the C specification, but it can only run tiny examp= les. > The verified CompCert compiler comes with a semantics one could interpret= , but > that only checks code for compatibility with CompCert C, which has a lot = less > (and a bit more) UB than real C. > There are also two efforts that turned into commercial tools that I have = not > tried, and for which there is hardly any documentation of how they interp= ret the > C standard so it's not clear what a green light from them means when comp= iling > with gcc or clang. I also don't know how much real-world code they can ac= tually run. > - TrustInSoft/tis-interpreter, mostly gone from the web but still availab= le in > the wayback machine > (https://web.archive.org/web/20200804061411/https://github.com/TrustInSof= t/tis-interpreter/); > I assume this got integrated into their "TrustInSoft Analyzer" product. > - kcc, a K-framework based formalization of C that is executable. The pub= lic > repo is dead (https://github.com/kframework/c-semantics) and when I tried= to > build their tool that didn't work. The people behind this have a company = that > offers "RV-Match" as a commercial product claiming to find bugs in C base= d on "a > complete formal ISO C11 semantics" so I guess that is where their efforts= go now. > > For C++ and Zig, I am not aware of anything comparable. > > Part of the problem is that in C, 2 people will have 3 ideas for what the > standard means. Compiler writers and programmers regularly have wildly > conflicting ideas of what is and is not allowed. There are many different= places > in the standard that have to be scanned to answer "is this well-defined" = even > for very simple programs. (https://godbolt.org/z/rjaWc6EzG is one of my f= avorite > examples.) A tool can check a single well-defined semantics, but who gets= to > decide what exactly those semantics are? > Formalizing the C standard requires extensive interpretation, so I am ske= ptical > of everyone who claims that they "formalized the C standard" and built a = tool on > that without extensive evaluation of how their formalization compares to = what > compilers do and what programmers rely on. The Cerberus people have done = that > evaluation (see e.g. https://dl.acm.org/doi/10.1145/2980983.2908081), but= none > of the other efforts have (to my knowledge). Ideally such a formalization= effort > would be done in close collaboration with compiler authors and the commit= tee so > that the ambiguities in the standard can be resolved and the formalizatio= n > becomes the one canonical interpretation. The Cerberus people are the one= s that > pushed the C provenance formalization through, so they made great progres= s here. > However, many issues remain, some quite long-standing (e.g. > https://www.open-std.org/jtc1/sc22/wg14/www/docs/dr_260.htm and > https://www.open-std.org/jtc1/sc22/wg14/www/docs/dr_451.htm, which in my = eyes > never got properly resolved by clarifying the standard). Martin and a few= others > are slowly pushing things in the right direction, but it takes a long tim= e. > Rust, by having a single project in charge of the one canonical implement= ation > and the specification, and having an open process that is well-suited for > incorporating user concerns, can move a lot quicker here. C has a huge > head-start, Rust has nothing like the C standard, but we are catching up = -- and > our goal is more ambitious than that; we are doing our best to learn from= C and > C++ and concluded that that style of specification is too prone to ambigu= ity, so > we are trying to achieve a formally precise unambiguous specification. Wa= sm > shows that this can be done, at industry scale, albeit for a small langua= ge -- > time we do it for a large one. :) > > So, yes I think Miri is fairly unique. But please let me know if I missed= something! > > (As an aside, the above hopefully also explains why some people in Rust a= re > concerned about alternative implementations. We do *not* want the current > de-factor behavior to ossify and become the specification. We do *not* wa= nt the > specification to just be a description of what the existing implementatio= ns at > the time happen to do, and declare all behavior differences to be UB or > unspecified or so just because no implementation is willing to adjust the= ir > behavior to match the rest. We want the specification to be prescriptive,= not > descriptive, and we will adjust the implementation as we see fit to achie= ve that > -- within the scope of Rust's stability guarantees. That's also why we ar= e so > cagey about spelling out the aliasing rules until we are sure we have a g= ood > enough model.) Very interesting, thank you for the exhaustive answer. Might it be accurate to categorize Miri as a "formal-semantics-based undefined-behavior-detecting interpreter"? >https://godbolt.org/z/rjaWc6EzG That example uses a compiler-specific attribute AFAIK, namely __attribute__((noinline)) When using compiler-specific attributes and options, the original language is arguably no longer being used, depending on the attribute. Though a language being inexpressive and possibly requiring compiler extensions to achieve some goals, possibly like in this C example, can be a disadvantage in itself. > [On formalization] I agree that Rust has some advantages in regards to formalization, but some of them that I think of, are different from what you mention here. And I also see some disadvantages. C is an ancient language, and parsing and handling C is made more complex by the preprocessor. Rust is a much younger language that avoided all that pain, and is easier to parse and handle. C++ is way worse, though might become closer to Rust with C++ modules. Rust is more willing to break existing code in projects, causing previously compiling projects to no longer compile. rustc does this rarely, but it has happened, also long after Rust 1.0. >From last year, 2024. https://internals.rust-lang.org/t/type-inference-breakage-in-1-80-has-n= ot-been-handled-well/21374 "Rust 1.80 broke builds of almost all versions of the very popular time crate (edit: please don't shoot the messenger in that GitHub thread!!!) Rust has left only a 4-month old version working. That was not enough time for the entire Rust ecosystem to update, especially that older versions of time were not yanked, and users had no advance warning that it will stop working. A crater run found a regression in over 5000 crates, and that has just been accepted as okay without any further action! This is below the level of stability and reliability that Rust should have." If C was willing to break code as much as Rust, it would be easier to clean up C. There is the Rust feature "editions", which is interesting, but in my opinion also very experimental from a programming language theory perspective. It does help avoid breakage while letting the languages developers clean up the language and improve it, but has some other consequences, such as source code having different semantics in different editions. Automated upgrade tools help with this, but does not handle all consequences. If C was made from scratch today, by experts at type theory, then C would likely have a much simpler type system and type checking than Rust, and would likely be much easier to formalize. Programs in C would likely still often be more complex than in C++ or Rust, however. >[Omitted] We do *not* want the > specification to just be a description of what the existing implementatio= ns at > the time happen to do, and declare all behavior differences to be UB or > unspecified or so just because no implementation is willing to adjust the= ir > behavior to match the rest. [Omitted] I have seen some Rust proponents literally say that there is a specification for Rust, and that it is called rustc/LLVM. Though those specific individuals may not have been the most credible individuals. A fear I have is that there may be hidden reliance in multiple different ways on LLVM, as well as on rustc. Maybe even very deeply so. The complexity of Rust's type system and rustc's type system checking makes me more worried about this point. If there are hidden elements, they may turn out to be very difficult to fix, especially if they are discovered to be fundamental. While having one compiler can be an advantage in some ways, it can arguably be a disadvantage in some other ways, as you acknowledge as well if I understand you correctly. You mention ossifying, but the more popular Rust becomes, the more painful breakage will be, and the less suited Rust will be as a research language. Using Crater to test existing Rust projects with, as you mention later in your email, is an interesting and possibly very valuable approach, but I do not know its limitations and disadvantages. Some projects will be closed source, and thus will presumably not be checked, as I understand it. Does Crater run Rust for Linux and relevant Rust kernel code? I hope that any new language at least has its language developers ensure that they have a type system that is formalized and proven correct before that langauge's 1.0 release. Since fixing a type system later can be difficult or practically impossible. A complex type system and complex type checking can be a larger risk in this regard relative to a simple type system and simple type checking, especially the more time passes and the more the language is used and have code written in it, making it more difficult to fix the language due to code breakage costing more. Some languages that broke backwards compatibility arguably suffered or died because of it, like Perl 6 or Scala 3. Python 2 to 3 was arguably successful but painful. Scala 3 even had automated conversion tools AFAIK. > > There are some issues in Rust that I am curious as to > > your views on. rustc or the Rust language has some type > > system holes, which still causes problems for rustc and > > their developers. > > > > https://github.com/lcnr/solver-woes/issues/1 > > https://github.com/rust-lang/rust/issues/75992 > > > > Those kinds of issues seem difficult to solve. > > > > In your opinion, is it accurate to say that the Rust language > > developers are working on a new type system for > > Rust-the-language and a new solver for rustc, and that > > they are trying to make the new type system and new solver > > as backwards compatible as possible? > > It's not really a new type system. It's a new implementation for the same= type > system. But yes there is work on a new "solver" (that I am not involved i= n) that > should finally fix some of the long-standing type system bugs. Specifical= ly, > this is a "trait solver", i.e. it is the component responsible for dealin= g with > trait constraints. Due to some unfortunate corner-case behaviors of the o= ld, > organically grown solver, it's very hard to do this in a backwards-compat= ible > way, but we have infrastructure for extensive ecosystem-wide testing to j= udge > the consequences of any given potential breaking change and ensure that a= lmost > all existing code keeps working. In fact, Rust 1.84 already started using= the > new solver for some things > (https://blog.rust-lang.org/2025/01/09/Rust-1.84.0.html) -- did you notic= e? > Hopefully not. :) If it is not a new type system, why then do they talk about backwards compatibility for existing Rust projects? If the type system is not changed, existing projects would still type check. And in this repository of one of the main Rust language developers as I understand it, several issues are labeled with "S-fear". https://github.com/lcnr/solver-woes/issues They have also been working on this new solver for several years. Reading through the issues, a lot of the problems seem very hard. Best, VJ.