ksummit.lists.linux.dev archive mirror
 help / color / mirror / Atom feed
From: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
To: James Bottomley <James.Bottomley@hansenpartnership.com>
Cc: ksummit@lists.linux.dev, Steven Rostedt <rostedt@goodmis.org>
Subject: Re: [MAINTAINERS SUMMIT] The role of AI and LLMs in the kernel process
Date: Tue, 5 Aug 2025 19:55:04 +0100	[thread overview]
Message-ID: <13490467-2556-47cc-93d1-f54fe88ad06d@lucifer.local> (raw)
In-Reply-To: <2da83ff9881bef84a742c06e502c91178a78a8a3.camel@HansenPartnership.com>

(remembering to +cc Steven this time)

On Tue, Aug 05, 2025 at 02:34:40PM -0400, James Bottomley wrote:
> On Tue, 2025-08-05 at 18:55 +0100, Lorenzo Stoakes wrote:
> > On Tue, Aug 05, 2025 at 12:43:38PM -0400, James Bottomley wrote:
> > > On Tue, 2025-08-05 at 17:03 +0100, Lorenzo Stoakes wrote:
> > > > Unavoidably, LLMs are the hot topic in tech right now, and are
> > > > here to stay.
> > > >
> > > > This poses unique problems:
> > > >
> > > > * Never before have people been able to generate as much content
> > > > that may, on a surface reading, seem valid whilst in reality
> > > > being quite the opposite.
> > > >
> > > > * Equally, LLM's can introduce very subtle mistakes that humans
> > > > find difficult to pick up upon - humans implicitly assume that
> > > > the classes of errors they will encounter are the kinds other
> > > > humans would make - AI defeats that instinct.
> > >
> > > Do you have any examples of this?  I've found the opposite to be
> > > true:
> >
> > Sure - Steven encountered this in [1].
> >
> > As he says there:
> >
> > "If I had known, I would have examined the patch a little more
> > thoroughly,  and would have discovered a very minor mistake in the
> > patch."
>
> Heh, well now you make me look it seems that the minor mistake is
> adding at tail instead of head?  That seems to be because the hash list
> API doesn't have a head add ...
>
> I wouldn't really call that a subtle problem because the LLM would have
> picked up the head to tail conversion if we'd had an at head API for it
> to learn from.

You see, I feel like whatever example I provide would provoke a response
like this :)

I also encountered an LLM insisting that MAINTAINERS contained a section
that doesn't exist, but subtly incorrect. 'It' insisted that it was true
and I could check the file (it was wrong).

I've asked for explanations of concepts that it's got confidently,
misleadingly wrong.

https://rdel.substack.com/p/rdel-57-what-are-the-most-common

Is an article referencing common bugs generated by code-generating LLM
machinery.

In interacting with chat bots I've encountered _very confidently_ stated
stuff that is convincing, were you not to be expert enough to determine
otherwise.

I could go and try to gather a bunch of examples (hey, this is a proposal
right? If it were accepted then I'd be able to spend time firming stuff up
like this ;)

But I come back to the fundamental point that we are statistically
inferring information against an infinite number of possibilities. It is
simply mathematically inevitable there will be gaps, and errors can very
conceivably be subtle as well as glaring.

Either are problematic.

>
> > The algorithm is determining likely output based on statistics, and
> > therefore density of input. Since in reality one can write infinite
> > programs, it's mathematically inevitable that an LLM will have to
> > 'infer' answers.
> >
> > That inference has no basis in dynamics, that is a model of reality
> > that it can use to determine answers, rather it will, in essence,
> > provide a random result.
> >
> > If there is a great deal of input (e.g. C programs), then that
> > inference is
> > likely to manifest in very subtle errors. See [2] for a thoughtful
> > exploration from an AI expert on the topic of statistics vs.
> > dynamics, and [3] for a broader exploration of the topic from the
> > same author.
>
> Amazingly enough when you're trying to sell a new thing, you become
> very down on what you see as the old thing (bcachefs vs btrfs ...?)

Come on James, ;) I think this is rather an unfair dismissal of those
articles that are well-reasoned and thoughtful.

I think the discussion around statistical inference vs. dynamic modelling
is fairly profound and insightful.

Also that comparison... ;)

>
> >
> > [1]:
> > https://lore.kernel.org/workflows/20250724194556.105803db@gandalf.loc
> > al.home/
> > [2]:https://blog.piekniewski.info/2016/11/01/statistics-and-dynamics/
> > [3]:https://blog.piekniewski.info/2023/04/09/ai-reflections/
> >
> [...]
> > > > * The kernel is uniquely sensitive to erroneous (especially
> > > > subtly erroneous) code - even small errors can be highly
> > > > consequential. We use a programming language that can almost be
> > > > defined by its lack of any kind   of safety, and in some
> > > > subsystems patches are simply taken if no obvious problems exist,
> > > > making us rather vulnerable to this.
> > >
> > > I think that's really overlooking the fact that if properly trained
> > > (a somewhat big *if* depending on the model) AI should be very good
> > > at writing safe code in unsafe languages.  However it takes C
> > > specific
> >
> > I fundamentally disagree.
> >
> > The consequences of even extremely small mistakes can be very serious
> > in C, as the language does little to nothing for you.
> >
> > No matter how much data it absorbs it cannot span the entire space of
> > all possible programs or even anywhere close.
>
> Neither can a human and we get by on mostly pattern matching ourselves
> ...

This is a very typical counterargument made. The problem is that humans are
not able to generate these kinds of errors at this kind of scale in the
same way LLMs can*, and humans implicitly expect 'human-like' errors, that
we cannot assume will arise in this output.

We tend to have a fairly constrained set of errors that we make, which you
can usually reason about and really - maintainers pattern match on errors
made as much as patch writers pattern match on writing them.

Breaking these assumptions in unusual ways is likely to be problematic.

*Excepting certain coccinelle contributors of course...

>
> > I mean again, I apply the arguments above as to why I feel this is
> > _fundamental_ to the approach.
> >
> > Kernel code is also very specific and has characteristics that render
> > it different from userland. We must consider a great many more things
> > that would be handled for us were we userland - interrupts, the
> > context we are in, locks of all varieties, etc. etc.
> >
> > While there's a lot of kernel code (~10's of millions of line), for
> > an LLM that is very small, and we simply cannot generate more.
> >
> > Yes it can eat up all the C it can, but that isn't quite the same.
>
> You seem to be assuming training is simply dump the data corpus and let
> the model fend for itself.  It isn't it's a more painstaking process
> that finds the mistakes in the output and gets the model to improve
> itself ... it is more like human teaching.

No, I assume that statistical inference cannot be established for an
effectively infinite problem space, which I think in reasonable.

>
> I'm not saying current AI is perfect, but I am saying that most of the
> issues with current AI can be traced to training problems which can be
> corrected in the model if anyone cares enough to do it.  The useful
> signal is that in all badly trained models I've seen the AI confidence
> score is really low because of the multiple matches in different areas
> that proper training would separate.  THat's why I think AI confidence
> score should be the first thing we ask for.

Again, I've no issue with this confidence score as a data point, though we
do need to assess how reliable it is.

>
> Regards,
>
> James
>

I think we're diverging a little from the broader point being made here -
we need a clear policy on this - to details as to what kinds of problems
LLMs pose.

So whether we agree to disagree on some of these details, I feel we can
(probably? :) agree on the need for a coherent approach and a clear policy
on this.

And to be clear, I'm not opposing LLMs per se, I'm simply underlying the
kinds of issues we ought to be cautious of.

Ultimately I think we ought to let individual maintainers decide what they
will/won't accept (within reason).

Cheers, Lorenzo

  reply	other threads:[~2025-08-05 18:55 UTC|newest]

Thread overview: 43+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-08-05 16:03 Lorenzo Stoakes
2025-08-05 16:43 ` James Bottomley
2025-08-05 17:11   ` Mark Brown
2025-08-05 17:23     ` James Bottomley
2025-08-05 17:43       ` Sasha Levin
2025-08-05 17:58         ` Lorenzo Stoakes
2025-08-05 18:16       ` Mark Brown
2025-08-05 18:01     ` Lorenzo Stoakes
2025-08-05 18:46       ` Mark Brown
2025-08-05 19:18         ` Lorenzo Stoakes
2025-08-05 17:17   ` Stephen Hemminger
2025-08-05 17:55   ` Lorenzo Stoakes
2025-08-05 18:23     ` Lorenzo Stoakes
2025-08-12 13:44       ` Steven Rostedt
2025-08-05 18:34     ` James Bottomley
2025-08-05 18:55       ` Lorenzo Stoakes [this message]
2025-08-12 13:50       ` Steven Rostedt
2025-08-05 18:39     ` Sasha Levin
2025-08-05 19:15       ` Lorenzo Stoakes
2025-08-05 20:02         ` James Bottomley
2025-08-05 20:48           ` Al Viro
2025-08-06 19:26           ` Lorenzo Stoakes
2025-08-07 12:25             ` Mark Brown
2025-08-07 13:00               ` Lorenzo Stoakes
2025-08-11 21:26                 ` Luis Chamberlain
2025-08-12 14:19                 ` Steven Rostedt
2025-08-06  4:04       ` Alexey Dobriyan
2025-08-06 20:36         ` Sasha Levin
2025-08-05 21:58   ` Jiri Kosina
2025-08-06  6:58     ` Hannes Reinecke
2025-08-06 19:36       ` Lorenzo Stoakes
2025-08-06 19:35     ` Lorenzo Stoakes
2025-08-05 18:10 ` H. Peter Anvin
2025-08-05 18:19   ` Lorenzo Stoakes
2025-08-06  5:49   ` Julia Lawall
2025-08-06  9:25     ` Dan Carpenter
2025-08-06  9:39       ` Julia Lawall
2025-08-06 19:30       ` Lorenzo Stoakes
2025-08-12 14:37         ` Steven Rostedt
2025-08-12 15:02           ` Sasha Levin
2025-08-12 15:24             ` Paul E. McKenney
2025-08-12 15:25               ` Sasha Levin
2025-08-12 15:28                 ` Paul E. McKenney

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=13490467-2556-47cc-93d1-f54fe88ad06d@lucifer.local \
    --to=lorenzo.stoakes@oracle.com \
    --cc=James.Bottomley@hansenpartnership.com \
    --cc=ksummit@lists.linux.dev \
    --cc=rostedt@goodmis.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox