From: ebiederm@xmission.com (Eric W. Biederman)
To: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: ksummit-discuss@lists.linuxfoundation.org
Subject: Re: [Ksummit-discuss] [CORE TOPIC] More useful types in the linux kernel
Date: Tue, 19 Jul 2016 19:08:12 -0500 [thread overview]
Message-ID: <87lh0xf9xv.fsf@x220.int.ebiederm.org> (raw)
In-Reply-To: <1468962539.2383.82.camel@HansenPartnership.com> (James Bottomley's message of "Tue, 19 Jul 2016 14:08:59 -0700")
James Bottomley <James.Bottomley@HansenPartnership.com> writes:
> On Tue, 2016-07-19 at 10:32 -0500, Eric W. Biederman wrote:
>> Historically the types in C came about because the machines
>> fundamentally supported different data types either with different
>> sizes or different characteristics (i.e. u8, u16, float, double).
>> These data types and the C type system were built so programmers
>> could tell the machine what they needed it to do.
>>
>> There is another genesis of types that started with the simply typed
>> lambda calculs that is about eliminating common errors and otherwise
>> helping a programmer get their code right.
>>
>> In the years since C was invented there has been a lot of activity
>> and a
>> little bit of progress in this area. Would people be receptive to
>> improvements in this area?
>>
>> I would like to talk to folks and gague what it would take to make
>> improvements in this area acceptable, practical, and useful.
>>
>> Would a gcc plugin that checks the most interesting things that
>> sparse checks on every build be interesting? (endianness of integer
>> types for example)
>
> How would this be different from simply automatically running sparse in
> the kernel build if the binary is present (effectively making make C=1
> the default)?
Nothing. I am just honestly looking at ways that we can get things to
always or almost always run. Sparse isn't getting run regularly now so
I was suspect that would not be as good of a solution.
>> Would a type system for pointers derived from separation logic that
>> has the concept that a piece of data is owned by a piece of running
>> code rather than another piece of data be interesting?
>
> By this you mean a thread of execution that should be expected to free
> the data pointed to when it finishes? Sort of like a self garbage
> collecting reference?
Not really. The big idea is to make expressable in the type system the
key concepts on how a programmer reasons about their data structures,
instead of trying to make programmers perform some convoluted logic to
express things in concepts that are trivial to implement in a type
system.
Another way to talk about it would be complete alias analysis at type
checking time.
Which means that the type system then knows essentially everything the
programmer knows about aliases and object lifetimes and if you don't
pass something on or free it, the type checker then knows you made
a mistake.
>> * This cleanly allows for doubly linked lists.
>>
>> * This is useful to ensure that data is either put in another data
>> structure where it is remembered or it is freed.
>>
>> * This is useful to ensure reference counts are not leaked.
>>
>> * This is useful to ensure that every lock is paired with an
>> unlock.
>>
>> My personal filter for things like this are types that can be checked
>> in time proportional to the amount of code to be built so that it is
>> roughly the same situation we are in now.
>>
>>
>> Given it's heritage and it's history the type system in C has serious
>> limitations that I don't know if they are correctible, when it comes
>> to catching programmer mistakes: silent truncation of unsigned types
>> into smaller unsigned types, casts, etc. Would people be willing to
>> consider a simple, link compatible alternative to C for some of the
>> code in the kernel that had the same low level control of the machine
>> but had a type system that made catching mistakes easier?
>>
>>
>> Deploying solutions like this will take a fair bit of grunt work, and
>> time similar or worse than the big kernel lock removal. Given how
>> widely Linux is used and how annoying some of these bugs can be I
>> think it is worthwhile to dig in and see what kind of improvements
>> can be made.
>>
>> I would really like to get a feel among kernel maintainers and
>> developers if this is something that is interesting, and what kind of
>> constraints they think something like this would need to be usable
>> for the kernel?
>
> I've got to say that rewriting in some as yet undefined language simply
> to get better typing and conversions looks very daunting. Before even
> considering this, can you answer why an extension to sparse, which
> would mostly flag the problems if we use the correct annotations,
> wouldn't work just as well? We'd still have to add the additional
> annotations (and someone would have to update sparse) but it would
> still be a lot easier than rewriting and giving the kernel a build
> dependency on whatever the rewrite is done in.
That is a very good question. The short answer is that I am not yet
convinced I can figure out how to retrofit the type checks I want onto
C's type system. I have implemented the type checks I am looking at in
a fresh language I am toying with where I can throw out inconvinient
unnecessary corner cases. After I finished boiling the ocean the basic
types were similar enough to C's type system that I think I can make
the connection but I haven't tried yet.
Assuming things can be sorted so they coexist nicely C's types. No
advantage. If it turns out that to make the code not a pain to write we
need better syntax or some basic type inference there is a large
advantage in converting code.
The first place I can think of where I might get hung up in the area of
discriminated unions. I am pretty certain that some cases of type
safety will require them and C doesn't have anything like that today. A
practical example would be a type system that ensures you call
IS_ERR(ptr) before you use a pointer as a pointer.
Ordinary C unions and type casts that serve the same purpose would at the
end of the day all have to be reabstracted. Because otherwise memory
safety cound not be analyized.
Some of that reabstraction I will need to introduce type variables which
a completely foreing concept to C.
On the flip side since the linux kernel and low level programming where
precise control of the machine happens is my target audience it is
definitely worth looking at what can be done in a context like sparse.
That is the fastest way to connect the tools with the real world
problems. If it can't be done or if it can't be made nice simple
and easy to use then there will be good arguments for why we need a new
syntax.
Mostly I asked the question because the type system would be so much
easier in a green field setting, and I wanted to be lazy. So hearing
the common sense request (Please fix C.) Is helpful to get that lazy
part of myself in gear.
The end game for me is a type system that doesn't permit memory errors,
reference counting errors, locking errors, or deadlocks, while at the
same time allowing pretty much the code and data structures that we use
in the kernel today.
Eric
next prev parent reply other threads:[~2016-07-20 0:21 UTC|newest]
Thread overview: 82+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-07-19 15:32 Eric W. Biederman
2016-07-19 17:31 ` Mark Brown
2016-07-19 18:52 ` Jiri Kosina
2016-07-19 20:39 ` Eric W. Biederman
2016-07-20 15:53 ` Mark Brown
2016-07-20 17:04 ` [Ksummit-discuss] [CORE TOPIC] [TECH TOPIC] Support (or move towards to) LLVM Jiri Kosina
2016-07-20 18:35 ` Alexey Dobriyan
2016-07-20 18:52 ` Mark Brown
2016-07-21 9:54 ` David Woodhouse
2016-07-21 13:41 ` Shuah Khan
2016-07-21 14:02 ` David Woodhouse
2016-07-21 16:21 ` Mark Brown
2016-07-23 3:28 ` Behan Webster
2016-07-21 18:38 ` Jiri Kosina
2016-07-21 20:47 ` Paul Turner
2016-07-26 11:22 ` David Woodhouse
2016-07-19 21:08 ` [Ksummit-discuss] [CORE TOPIC] More useful types in the linux kernel James Bottomley
2016-07-20 0:08 ` Eric W. Biederman [this message]
2016-07-20 7:32 ` Julia Lawall
2016-07-20 12:11 ` Jan Kara
2016-07-28 3:33 ` Steven Rostedt
2016-07-19 21:26 ` Josh Triplett
2016-07-20 2:36 ` Eric W. Biederman
2016-07-30 18:03 ` Eric W. Biederman
2016-07-30 18:49 ` Josh Triplett
2016-07-30 19:34 ` Eric W. Biederman
2016-07-30 20:56 ` Josh Triplett
2016-07-30 22:21 ` Eric W. Biederman
2016-07-21 15:05 ` David Howells
2016-07-21 23:33 ` Dmitry Torokhov
2016-07-22 6:00 ` Hannes Reinecke
2016-07-22 6:14 ` Julia Lawall
2016-07-22 13:57 ` Hannes Reinecke
2016-07-22 14:40 ` Julia Lawall
2016-07-22 19:12 ` Arnd Bergmann
2016-07-26 11:48 ` David Woodhouse
2016-07-26 12:53 ` Hannes Reinecke
2016-07-26 13:59 ` Alexey Dobriyan
2016-07-26 13:53 ` Alexey Dobriyan
2016-07-27 12:40 ` Julia Lawall
2016-07-27 13:25 ` James Bottomley
2016-07-27 13:33 ` David Woodhouse
2016-07-27 17:21 ` Bird, Timothy
2016-08-01 22:17 ` Rob Herring
2016-08-12 1:29 ` Stephen Boyd
2016-08-11 15:44 ` Dan Carpenter
2016-08-12 0:38 ` NeilBrown
2016-08-12 20:56 ` Dan Carpenter
2016-08-12 3:51 ` Matthew Wilcox
2016-08-12 4:01 ` Josh Triplett
2016-08-12 4:07 ` Matthew Wilcox
2016-08-12 5:29 ` Alexey Dobriyan
2016-08-12 5:38 ` Michael S. Tsirkin
2016-08-12 6:04 ` Julia Lawall
2016-08-12 6:09 ` Michael S. Tsirkin
2016-08-12 6:23 ` Matthew Wilcox
2016-08-12 6:37 ` Julia Lawall
2016-08-12 5:50 ` Matthew Wilcox
2016-08-04 7:15 ` NeilBrown
2016-08-04 11:19 ` Julia Lawall
2016-07-22 7:03 ` David Howells
2016-07-22 10:10 ` Alexey Dobriyan
2016-07-22 10:13 ` David Howells
2016-07-22 10:22 ` Alexey Dobriyan
2016-07-22 10:53 ` Vlastimil Babka
2016-07-22 11:05 ` David Howells
2016-07-22 17:18 ` Julia Lawall
2016-07-22 18:19 ` Dmitry Torokhov
2016-07-22 19:43 ` Guenter Roeck
2016-07-28 3:40 ` Steven Rostedt
2016-07-28 7:12 ` David Howells
2016-08-02 10:48 ` Jani Nikula
2016-08-04 11:31 ` David Woodhouse
2016-08-04 12:07 ` Jani Nikula
2016-07-22 11:19 ` David Howells
2016-07-22 12:44 ` Linus Walleij
2016-07-22 13:26 ` David Howells
2016-08-12 4:42 ` Michael S. Tsirkin
[not found] ` <871t1ulfvz.fsf@notabene.neil.brown.name>
2016-08-12 5:34 ` Michael S. Tsirkin
2016-08-12 6:23 ` NeilBrown
[not found] ` <87y442jytb.fsf@notabene.neil.brown.name>
2016-08-15 23:26 ` Michael S. Tsirkin
2016-08-12 6:23 ` NeilBrown
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87lh0xf9xv.fsf@x220.int.ebiederm.org \
--to=ebiederm@xmission.com \
--cc=James.Bottomley@HansenPartnership.com \
--cc=ksummit-discuss@lists.linuxfoundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox