ksummit.lists.linux.dev archive mirror
 help / color / mirror / Atom feed
* [MAINTAINERS SUMMIT] The role of AI and LLMs in the kernel process
@ 2025-08-05 16:03 Lorenzo Stoakes
  2025-08-05 16:43 ` James Bottomley
  2025-08-05 18:10 ` H. Peter Anvin
  0 siblings, 2 replies; 43+ messages in thread
From: Lorenzo Stoakes @ 2025-08-05 16:03 UTC (permalink / raw)
  To: ksummit

Unavoidably, LLMs are the hot topic in tech right now, and are here to
stay.

This poses unique problems:

* Never before have people been able to generate as much content that may,
  on a surface reading, seem valid whilst in reality being quite the
  opposite.

* Equally, LLM's can introduce very subtle mistakes that humans find
  difficult to pick up upon - humans implicitly assume that the classes of
  errors they will encounter are the kinds other humans would make - AI
  defeats that instinct.

* The kernel is uniquely sensitive to erroneous (especially subtly
  erroneous) code - even small errors can be highly consequential. We use a
  programming language that can almost be defined by its lack of any kind
  of safety, and in some subsystems patches are simply taken if no obvious
  problems exist, making us rather vulnerable to this.

* On the other hand, there are use cases which are useful - test data/code
  generation, summarisation, smart auto-complete - so it'd perhaps be
  foolish to entirely dismiss AI.

A very important non-technical point we must consider is that, the second
we even appear to be open to AI submission of _any_ kind, the press will
inevitably report on it gleefully, likely with oversimplified headlines
like 'Linux accepts AI patches'.

The moment that happens, we are likely to see a significant uptick in AI
submissions whether we like it or not.

I propose that we establish the broad rules as they pertain to the kernel,
and would like to bring the discussion to the Maintainer's Summit so we can
determine what those should be.

It's important to get a sense of how maintainers feel about this - whether
what is proposed is opt-in or opt-out - and how we actually implement this.

There has been discussion on-list about this (see [0]), with many
suggestions made including a 'traffic light' system per-subsystem, however
many open questions remain - the devil is in the details.

[0]:https://lore.kernel.org/all/20250727195802.2222764-1-sashal@kernel.org/

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [MAINTAINERS SUMMIT] The role of AI and LLMs in the kernel process
  2025-08-05 16:03 [MAINTAINERS SUMMIT] The role of AI and LLMs in the kernel process Lorenzo Stoakes
@ 2025-08-05 16:43 ` James Bottomley
  2025-08-05 17:11   ` Mark Brown
                     ` (3 more replies)
  2025-08-05 18:10 ` H. Peter Anvin
  1 sibling, 4 replies; 43+ messages in thread
From: James Bottomley @ 2025-08-05 16:43 UTC (permalink / raw)
  To: Lorenzo Stoakes, ksummit

On Tue, 2025-08-05 at 17:03 +0100, Lorenzo Stoakes wrote:
> Unavoidably, LLMs are the hot topic in tech right now, and are here
> to stay.
> 
> This poses unique problems:
> 
> * Never before have people been able to generate as much content that
> may, on a surface reading, seem valid whilst in reality being quite
> the opposite.
> 
> * Equally, LLM's can introduce very subtle mistakes that humans find
> difficult to pick up upon - humans implicitly assume that the classes
> of errors they will encounter are the kinds other humans would make -
> AI defeats that instinct.

Do you have any examples of this?  I've found the opposite to be true:
AI is capable of really big stupid mistakes when it hasn't seen enough
of the pattern, but I can't recall seeing it make something you'd
classify as a subtle mistake (I assume it could copy subtle mistakes
from wrong training data, so I'm not saying it can't, just that I
haven't seen any).

I think the big mistakes could possibly be avoided by asking people who
submit patches to also append the AI confidence score:

https://learn.microsoft.com/en-us/azure/ai-services/document-intelligence/concept/accuracy-confidence?view=doc-intel-4.0.0

So we know how much similar training the model has seen before coming
to any conclusion about the value of the output.

> * The kernel is uniquely sensitive to erroneous (especially subtly
> erroneous) code - even small errors can be highly consequential. We
> use a programming language that can almost be defined by its lack of
> any kind   of safety, and in some subsystems patches are simply taken
> if no obvious problems exist, making us rather vulnerable to this.

I think that's really overlooking the fact that if properly trained (a
somewhat big *if* depending on the model) AI should be very good at
writing safe code in unsafe languages.  However it takes C specific
training to do this, so any LLM that's absorbed a load of rust, python
and javascript from the internet will be correspondingly bad at writing
safe C code.  Hence the origin of the LLM and its training corpus would
be a key factor in deciding to trust it.

> * On the other hand, there are use cases which are useful - test
> data/code generation, summarisation, smart auto-complete - so it'd
> perhaps be foolish to entirely dismiss AI.

Patch backporting is another such nice use.

> A very important non-technical point we must consider is that, the
> second we even appear to be open to AI submission of _any_ kind, the
> press will inevitably report on it gleefully, likely with
> oversimplified headlines like 'Linux accepts AI patches'.

Oh, I think simply accepting AI patches is old news:

https://www.cnbc.com/2025/04/29/satya-nadella-says-as-much-as-30percent-of-microsoft-code-is-written-by-ai.html

> The moment that happens, we are likely to see a significant uptick in
> AI submissions whether we like it or not.
> 
> I propose that we establish the broad rules as they pertain to the
> kernel, and would like to bring the discussion to the Maintainer's
> Summit so we can determine what those should be.
> 
> It's important to get a sense of how maintainers feel about this -
> whether what is proposed is opt-in or opt-out - and how we actually
> implement this.
> 
> There has been discussion on-list about this (see [0]), with many
> suggestions made including a 'traffic light' system per-subsystem,
> however many open questions remain - the devil is in the details.
> 
> [0]:
> https://lore.kernel.org/all/20250727195802.2222764-1-sashal@kernel.or
> g/

We're already getting AI generated bug reports from what I can tell. 
It would be really helpful to see the AI confidence score for them as
well.

Regards,

James



^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [MAINTAINERS SUMMIT] The role of AI and LLMs in the kernel process
  2025-08-05 16:43 ` James Bottomley
@ 2025-08-05 17:11   ` Mark Brown
  2025-08-05 17:23     ` James Bottomley
  2025-08-05 18:01     ` Lorenzo Stoakes
  2025-08-05 17:17   ` Stephen Hemminger
                     ` (2 subsequent siblings)
  3 siblings, 2 replies; 43+ messages in thread
From: Mark Brown @ 2025-08-05 17:11 UTC (permalink / raw)
  To: James Bottomley; +Cc: Lorenzo Stoakes, ksummit

[-- Attachment #1: Type: text/plain, Size: 1045 bytes --]

On Tue, Aug 05, 2025 at 12:43:38PM -0400, James Bottomley wrote:
> On Tue, 2025-08-05 at 17:03 +0100, Lorenzo Stoakes wrote:

> > * On the other hand, there are use cases which are useful - test
> > data/code generation, summarisation, smart auto-complete - so it'd
> > perhaps be foolish to entirely dismiss AI.

> Patch backporting is another such nice use.

Patch backporting sounds pretty scary to me, it's the sort of thing
where extra context that needs to be accounted for is very likely to
come up (eg, assumptions you can make about existing state or
santisation).  That trips up humans often enough and doesn't seem like
it's playing to the strengths advertised for LLMs.

TBH I'm not thrilled about the general test code is trivial assumption
either, unstable test code or test code that doesn't cover what people
think it covers are both problems.  The issues when things go wrong are
less severe than the kernel itself but things still need to be
maintained and we already have issues with people being dismissive of
the selftests.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [MAINTAINERS SUMMIT] The role of AI and LLMs in the kernel process
  2025-08-05 16:43 ` James Bottomley
  2025-08-05 17:11   ` Mark Brown
@ 2025-08-05 17:17   ` Stephen Hemminger
  2025-08-05 17:55   ` Lorenzo Stoakes
  2025-08-05 21:58   ` Jiri Kosina
  3 siblings, 0 replies; 43+ messages in thread
From: Stephen Hemminger @ 2025-08-05 17:17 UTC (permalink / raw)
  To: James Bottomley; +Cc: Lorenzo Stoakes, ksummit

On Tue, 05 Aug 2025 12:43:38 -0400
James Bottomley <James.Bottomley@HansenPartnership.com> wrote:

> On Tue, 2025-08-05 at 17:03 +0100, Lorenzo Stoakes wrote:
> > Unavoidably, LLMs are the hot topic in tech right now, and are here
> > to stay.
> > 
> > This poses unique problems:
> > 
> > * Never before have people been able to generate as much content that
> > may, on a surface reading, seem valid whilst in reality being quite
> > the opposite.
> > 
> > * Equally, LLM's can introduce very subtle mistakes that humans find
> > difficult to pick up upon - humans implicitly assume that the classes
> > of errors they will encounter are the kinds other humans would make -
> > AI defeats that instinct.  
> 
> Do you have any examples of this?  I've found the opposite to be true:
> AI is capable of really big stupid mistakes when it hasn't seen enough
> of the pattern, but I can't recall seeing it make something you'd
> classify as a subtle mistake (I assume it could copy subtle mistakes
> from wrong training data, so I'm not saying it can't, just that I
> haven't seen any).
> 
> I think the big mistakes could possibly be avoided by asking people who
> submit patches to also append the AI confidence score:
> 
> https://learn.microsoft.com/en-us/azure/ai-services/document-intelligence/concept/accuracy-confidence?view=doc-intel-4.0.0
> 
> So we know how much similar training the model has seen before coming
> to any conclusion about the value of the output.
> 
> > * The kernel is uniquely sensitive to erroneous (especially subtly
> > erroneous) code - even small errors can be highly consequential. We
> > use a programming language that can almost be defined by its lack of
> > any kind   of safety, and in some subsystems patches are simply taken
> > if no obvious problems exist, making us rather vulnerable to this.  
> 
> I think that's really overlooking the fact that if properly trained (a
> somewhat big *if* depending on the model) AI should be very good at
> writing safe code in unsafe languages.  However it takes C specific
> training to do this, so any LLM that's absorbed a load of rust, python
> and javascript from the internet will be correspondingly bad at writing
> safe C code.  Hence the origin of the LLM and its training corpus would
> be a key factor in deciding to trust it.
> 
> > * On the other hand, there are use cases which are useful - test
> > data/code generation, summarisation, smart auto-complete - so it'd
> > perhaps be foolish to entirely dismiss AI.  
> 
> Patch backporting is another such nice use.
> 
> > A very important non-technical point we must consider is that, the
> > second we even appear to be open to AI submission of _any_ kind, the
> > press will inevitably report on it gleefully, likely with
> > oversimplified headlines like 'Linux accepts AI patches'.  
> 
> Oh, I think simply accepting AI patches is old news:
> 
> https://www.cnbc.com/2025/04/29/satya-nadella-says-as-much-as-30percent-of-microsoft-code-is-written-by-ai.html
> 
> > The moment that happens, we are likely to see a significant uptick in
> > AI submissions whether we like it or not.
> > 
> > I propose that we establish the broad rules as they pertain to the
> > kernel, and would like to bring the discussion to the Maintainer's
> > Summit so we can determine what those should be.
> > 
> > It's important to get a sense of how maintainers feel about this -
> > whether what is proposed is opt-in or opt-out - and how we actually
> > implement this.
> > 
> > There has been discussion on-list about this (see [0]), with many
> > suggestions made including a 'traffic light' system per-subsystem,
> > however many open questions remain - the devil is in the details.
> > 
> > [0]:
> > https://lore.kernel.org/all/20250727195802.2222764-1-sashal@kernel.or
> > g/  
> 
> We're already getting AI generated bug reports from what I can tell. 
> It would be really helpful to see the AI confidence score for them as
> well.
> 
> Regards,
> 
> James
> 
> 
> 

I have been toying with AI for some things that got stuck on my TODO
list for years (like writing tests). Have to agree with James, the errors
in that code are not subtle. Mostly they are things that never work because
AI has no context about what it is doing. Especially when porting between environments.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [MAINTAINERS SUMMIT] The role of AI and LLMs in the kernel process
  2025-08-05 17:11   ` Mark Brown
@ 2025-08-05 17:23     ` James Bottomley
  2025-08-05 17:43       ` Sasha Levin
  2025-08-05 18:16       ` Mark Brown
  2025-08-05 18:01     ` Lorenzo Stoakes
  1 sibling, 2 replies; 43+ messages in thread
From: James Bottomley @ 2025-08-05 17:23 UTC (permalink / raw)
  To: Mark Brown; +Cc: Lorenzo Stoakes, ksummit

[-- Attachment #1: Type: text/plain, Size: 2129 bytes --]

On Tue, 2025-08-05 at 18:11 +0100, Mark Brown wrote:
> On Tue, Aug 05, 2025 at 12:43:38PM -0400, James Bottomley wrote:
> > On Tue, 2025-08-05 at 17:03 +0100, Lorenzo Stoakes wrote:
> 
> > > * On the other hand, there are use cases which are useful - test
> > > data/code generation, summarisation, smart auto-complete - so
> > > it'd perhaps be foolish to entirely dismiss AI.
> 
> > Patch backporting is another such nice use.
> 
> Patch backporting sounds pretty scary to me, it's the sort of thing
> where extra context that needs to be accounted for is very likely to
> come up (eg, assumptions you can make about existing state or
> santisation).

If you think about it, the git history contains the exact patch path
between where the patch was applied and where you want to apply it. 
That's a finite data set which LLMs can be trained to work nicely with.

>   That trips up humans often enough and doesn't seem like it's
> playing to the strengths advertised for LLMs.

Humans don't look at the patch path (or use something broad like a
range scan).  The AI can be patient enough to actually go over it all.

> TBH I'm not thrilled about the general test code is trivial
> assumption either,

I don't think anyone who trains AI thinks testing is trivial.  It does
take special training for AI to be good at test writing.

>  unstable test code or test code that doesn't cover what people think
> it covers are both problems.

Test coverage and constructing tests for coverage is another place AI
can help.  Especially given coverage is a measurable quantity which
makes training easier.

>   The issues when things go wrong are less severe than the kernel
> itself but things still need to be maintained and we already have
> issues with people being dismissive of the selftests.

Well our selftests, having just spent ages figuring out how to run a
subset of the bpf tests, are very eccentric ... in that each test set
runs in a completely different way from any of the others and knowledge
from one selftest area doesn't apply to a different one.

Regards,

James


[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [MAINTAINERS SUMMIT] The role of AI and LLMs in the kernel process
  2025-08-05 17:23     ` James Bottomley
@ 2025-08-05 17:43       ` Sasha Levin
  2025-08-05 17:58         ` Lorenzo Stoakes
  2025-08-05 18:16       ` Mark Brown
  1 sibling, 1 reply; 43+ messages in thread
From: Sasha Levin @ 2025-08-05 17:43 UTC (permalink / raw)
  To: James Bottomley; +Cc: Mark Brown, Lorenzo Stoakes, ksummit

On Tue, Aug 05, 2025 at 01:23:18PM -0400, James Bottomley wrote:
>On Tue, 2025-08-05 at 18:11 +0100, Mark Brown wrote:
>> On Tue, Aug 05, 2025 at 12:43:38PM -0400, James Bottomley wrote:
>> > On Tue, 2025-08-05 at 17:03 +0100, Lorenzo Stoakes wrote:
>>
>> > > * On the other hand, there are use cases which are useful - test
>> > > data/code generation, summarisation, smart auto-complete - so
>> > > it'd perhaps be foolish to entirely dismiss AI.
>>
>> > Patch backporting is another such nice use.
>>
>> Patch backporting sounds pretty scary to me, it's the sort of thing
>> where extra context that needs to be accounted for is very likely to
>> come up (eg, assumptions you can make about existing state or
>> santisation).
>
>If you think about it, the git history contains the exact patch path
>between where the patch was applied and where you want to apply it.
>That's a finite data set which LLMs can be trained to work nicely with.

Most of the patches that fail to backport and get a "FAILED:" mail as a
result are really just either a trivial context conflict ar a missing
dependency.

Resolving those is within the realm of a "junior engineer" which I
suspect AI would tackle fairly well.

The other ~10-20% are indeed something more complex due to things like a
major rewrite/refactor/etc of a subsystem, where I wouldn't trust the
current state of the art AI to tackle. But mayble a few years from now?

-- 
Thanks,
Sasha

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [MAINTAINERS SUMMIT] The role of AI and LLMs in the kernel process
  2025-08-05 16:43 ` James Bottomley
  2025-08-05 17:11   ` Mark Brown
  2025-08-05 17:17   ` Stephen Hemminger
@ 2025-08-05 17:55   ` Lorenzo Stoakes
  2025-08-05 18:23     ` Lorenzo Stoakes
                       ` (2 more replies)
  2025-08-05 21:58   ` Jiri Kosina
  3 siblings, 3 replies; 43+ messages in thread
From: Lorenzo Stoakes @ 2025-08-05 17:55 UTC (permalink / raw)
  To: James Bottomley; +Cc: ksummit

On Tue, Aug 05, 2025 at 12:43:38PM -0400, James Bottomley wrote:
> On Tue, 2025-08-05 at 17:03 +0100, Lorenzo Stoakes wrote:
> > Unavoidably, LLMs are the hot topic in tech right now, and are here
> > to stay.
> >
> > This poses unique problems:
> >
> > * Never before have people been able to generate as much content that
> > may, on a surface reading, seem valid whilst in reality being quite
> > the opposite.
> >
> > * Equally, LLM's can introduce very subtle mistakes that humans find
> > difficult to pick up upon - humans implicitly assume that the classes
> > of errors they will encounter are the kinds other humans would make -
> > AI defeats that instinct.
>
> Do you have any examples of this?  I've found the opposite to be true:

Sure - Steven encountered this in [1].

As he says there:

"If I had known, I would have examined the patch a little more thoroughly,
 and would have discovered a very minor mistake in the patch."

The algorithm is determining likely output based on statistics, and
therefore density of input. Since in reality one can write infinite
programs, it's mathematically inevitable that an LLM will have to 'infer'
answers.

That inference has no basis in dynamics, that is a model of reality that it
can use to determine answers, rather it will, in essence, provide a random
result.

If there is a great deal of input (e.g. C programs), then that inference is
likely to manifest in very subtle errors. See [2] for a thoughtful
exploration from an AI expert on the topic of statistics vs. dynamics, and
[3] for a broader exploration of the topic from the same author.

[1]:https://lore.kernel.org/workflows/20250724194556.105803db@gandalf.local.home/
[2]:https://blog.piekniewski.info/2016/11/01/statistics-and-dynamics/
[3]:https://blog.piekniewski.info/2023/04/09/ai-reflections/


> AI is capable of really big stupid mistakes when it hasn't seen enough
> of the pattern, but I can't recall seeing it make something you'd
> classify as a subtle mistake (I assume it could copy subtle mistakes
> from wrong training data, so I'm not saying it can't, just that I
> haven't seen any).

It's not from incorrect training data, it's fundamental to how LLMs
work.

>
> I think the big mistakes could possibly be avoided by asking people who
> submit patches to also append the AI confidence score:
>
> https://learn.microsoft.com/en-us/azure/ai-services/document-intelligence/concept/accuracy-confidence?view=doc-intel-4.0.0

That's interesting, though I don't know how reliable this might be.

However it's for exactly this kind of input that I proposed the topic :)

>
> So we know how much similar training the model has seen before coming
> to any conclusion about the value of the output.
>
> > * The kernel is uniquely sensitive to erroneous (especially subtly
> > erroneous) code - even small errors can be highly consequential. We
> > use a programming language that can almost be defined by its lack of
> > any kind   of safety, and in some subsystems patches are simply taken
> > if no obvious problems exist, making us rather vulnerable to this.
>
> I think that's really overlooking the fact that if properly trained (a
> somewhat big *if* depending on the model) AI should be very good at
> writing safe code in unsafe languages.  However it takes C specific

I fundamentally disagree.

The consequences of even extremely small mistakes can be very serious in C,
as the language does little to nothing for you.

No matter how much data it absorbs it cannot span the entire space of all
possible programs or even anywhere close.

I mean again, I apply the arguments above as to why I feel this is
_fundamental_ to the approach.

Kernel code is also very specific and has characteristics that render it
different from userland. We must consider a great many more things that
would be handled for us were we userland - interrupts, the context we are
in, locks of all varieties, etc. etc.

While there's a lot of kernel code (~10's of millions of line), for an LLM
that is very small, and we simply cannot generate more.

Yes it can eat up all the C it can, but that isn't quite the same.

> training to do this, so any LLM that's absorbed a load of rust, python
> and javascript from the internet will be correspondingly bad at writing
> safe C code.  Hence the origin of the LLM and its training corpus would
> be a key factor in deciding to trust it.
>
> > * On the other hand, there are use cases which are useful - test
> > data/code generation, summarisation, smart auto-complete - so it'd
> > perhaps be foolish to entirely dismiss AI.
>
> Patch backporting is another such nice use.

As long as carefully checked :)

>
> > A very important non-technical point we must consider is that, the
> > second we even appear to be open to AI submission of _any_ kind, the
> > press will inevitably report on it gleefully, likely with
> > oversimplified headlines like 'Linux accepts AI patches'.
>
> Oh, I think simply accepting AI patches is old news:
>
> https://www.cnbc.com/2025/04/29/satya-nadella-says-as-much-as-30percent-of-microsoft-code-is-written-by-ai.html

That doesn't pertain to the kernel specifically.

Of course code being written by AI is old news, but there's no doubt that
tech publications would JUMP on anything even suggesting we are open in
some broad way to AI submissions.

Given Linus's rather neutral public position on AI, it'd certainly mark
what _would be perceived_, in my view, as a sea change on this.

>
> > The moment that happens, we are likely to see a significant uptick in
> > AI submissions whether we like it or not.
> >
> > I propose that we establish the broad rules as they pertain to the
> > kernel, and would like to bring the discussion to the Maintainer's
> > Summit so we can determine what those should be.
> >
> > It's important to get a sense of how maintainers feel about this -
> > whether what is proposed is opt-in or opt-out - and how we actually
> > implement this.
> >
> > There has been discussion on-list about this (see [0]), with many
> > suggestions made including a 'traffic light' system per-subsystem,
> > however many open questions remain - the devil is in the details.
> >
> > [0]:
> > https://lore.kernel.org/all/20250727195802.2222764-1-sashal@kernel.or
> > g/
>
> We're already getting AI generated bug reports from what I can tell.
> It would be really helpful to see the AI confidence score for them as
> well.

That is definitely an interesting additional data point that could
potentially be helpful here! I wasn't aware of this so thanks for that :)

>
> Regards,
>
> James
>
>

Cheers, Lorenzo

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [MAINTAINERS SUMMIT] The role of AI and LLMs in the kernel process
  2025-08-05 17:43       ` Sasha Levin
@ 2025-08-05 17:58         ` Lorenzo Stoakes
  0 siblings, 0 replies; 43+ messages in thread
From: Lorenzo Stoakes @ 2025-08-05 17:58 UTC (permalink / raw)
  To: Sasha Levin; +Cc: James Bottomley, Mark Brown, ksummit

On Tue, Aug 05, 2025 at 01:43:00PM -0400, Sasha Levin wrote:
> On Tue, Aug 05, 2025 at 01:23:18PM -0400, James Bottomley wrote:
> > On Tue, 2025-08-05 at 18:11 +0100, Mark Brown wrote:
> > > On Tue, Aug 05, 2025 at 12:43:38PM -0400, James Bottomley wrote:
> > > > On Tue, 2025-08-05 at 17:03 +0100, Lorenzo Stoakes wrote:
> > >
> > > > > * On the other hand, there are use cases which are useful - test
> > > > > data/code generation, summarisation, smart auto-complete - so
> > > > > it'd perhaps be foolish to entirely dismiss AI.
> > >
> > > > Patch backporting is another such nice use.
> > >
> > > Patch backporting sounds pretty scary to me, it's the sort of thing
> > > where extra context that needs to be accounted for is very likely to
> > > come up (eg, assumptions you can make about existing state or
> > > santisation).
> >
> > If you think about it, the git history contains the exact patch path
> > between where the patch was applied and where you want to apply it.
> > That's a finite data set which LLMs can be trained to work nicely with.
>
> Most of the patches that fail to backport and get a "FAILED:" mail as a
> result are really just either a trivial context conflict ar a missing
> dependency.

Yeah I'm not surprised by that!

>
> Resolving those is within the realm of a "junior engineer" which I
> suspect AI would tackle fairly well.

Right.

>
> The other ~10-20% are indeed something more complex due to things like a
> major rewrite/refactor/etc of a subsystem, where I wouldn't trust the
> current state of the art AI to tackle. But mayble a few years from now?

Well, I have my doubts :) I think it's always a trade-off of scrutiny +
class of error vs. time gains.

But this is, of course, a point of debate.

>
> --
> Thanks,
> Sasha

By the way Sasha, I was planning to cc- you as this is obviously entirely
related to your series and I'm more than happy to propose a join discussion
or something like this, - I just followed the rules as per the maintainers
summit email strictly and didn't know whether I _could_ cc :)

I would enjoy having a friendly discussion about it in person regardless :)

Cheers, Lorenzo

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [MAINTAINERS SUMMIT] The role of AI and LLMs in the kernel process
  2025-08-05 17:11   ` Mark Brown
  2025-08-05 17:23     ` James Bottomley
@ 2025-08-05 18:01     ` Lorenzo Stoakes
  2025-08-05 18:46       ` Mark Brown
  1 sibling, 1 reply; 43+ messages in thread
From: Lorenzo Stoakes @ 2025-08-05 18:01 UTC (permalink / raw)
  To: Mark Brown; +Cc: James Bottomley, ksummit

On Tue, Aug 05, 2025 at 06:11:31PM +0100, Mark Brown wrote:
> TBH I'm not thrilled about the general test code is trivial assumption
> either, unstable test code or test code that doesn't cover what people
> think it covers are both problems.  The issues when things go wrong are
> less severe than the kernel itself but things still need to be
> maintained and we already have issues with people being dismissive of
> the selftests.

Yeah I didn't mean to downplay this or suggest it's not important, as you've
(possibly :P) seen I write a lot of it myself and consider it vital.

However the point is that we can:

- Generate scenarios relatively quickly that way.
- Resulatant mistakes are at least _less_ impactful. And you'd hope, would come
  out in the wash more easily.

However, I wonder whether it might be better simply to restrict LLMs to test
data generation or the like.

These are again all exactly the kinds of things that make this an important
topic to discuss I believe :)

Cheers, Lorenzo

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [MAINTAINERS SUMMIT] The role of AI and LLMs in the kernel process
  2025-08-05 16:03 [MAINTAINERS SUMMIT] The role of AI and LLMs in the kernel process Lorenzo Stoakes
  2025-08-05 16:43 ` James Bottomley
@ 2025-08-05 18:10 ` H. Peter Anvin
  2025-08-05 18:19   ` Lorenzo Stoakes
  2025-08-06  5:49   ` Julia Lawall
  1 sibling, 2 replies; 43+ messages in thread
From: H. Peter Anvin @ 2025-08-05 18:10 UTC (permalink / raw)
  To: Lorenzo Stoakes, ksummit

On August 5, 2025 9:03:18 AM PDT, Lorenzo Stoakes <lorenzo.stoakes@oracle.com> wrote:
>Unavoidably, LLMs are the hot topic in tech right now, and are here to
>stay.
>
>This poses unique problems:
>
>* Never before have people been able to generate as much content that may,
>  on a surface reading, seem valid whilst in reality being quite the
>  opposite.
>
>* Equally, LLM's can introduce very subtle mistakes that humans find
>  difficult to pick up upon - humans implicitly assume that the classes of
>  errors they will encounter are the kinds other humans would make - AI
>  defeats that instinct.
>
>* The kernel is uniquely sensitive to erroneous (especially subtly
>  erroneous) code - even small errors can be highly consequential. We use a
>  programming language that can almost be defined by its lack of any kind
>  of safety, and in some subsystems patches are simply taken if no obvious
>  problems exist, making us rather vulnerable to this.
>
>* On the other hand, there are use cases which are useful - test data/code
>  generation, summarisation, smart auto-complete - so it'd perhaps be
>  foolish to entirely dismiss AI.
>
>A very important non-technical point we must consider is that, the second
>we even appear to be open to AI submission of _any_ kind, the press will
>inevitably report on it gleefully, likely with oversimplified headlines
>like 'Linux accepts AI patches'.
>
>The moment that happens, we are likely to see a significant uptick in AI
>submissions whether we like it or not.
>
>I propose that we establish the broad rules as they pertain to the kernel,
>and would like to bring the discussion to the Maintainer's Summit so we can
>determine what those should be.
>
>It's important to get a sense of how maintainers feel about this - whether
>what is proposed is opt-in or opt-out - and how we actually implement this.
>
>There has been discussion on-list about this (see [0]), with many
>suggestions made including a 'traffic light' system per-subsystem, however
>many open questions remain - the devil is in the details.
>
>[0]:https://lore.kernel.org/all/20250727195802.2222764-1-sashal@kernel.org/
>

It would seem to me to be far more useful  to get AI to help find hard-to-see but perhaps common problems, or tidying up code (like Coccinelle on steroids.)

Things that are easy to see post hoc are correct, but can be hard or laborious to do. 

Another genuinely good usage for AI is for especially non-English speakers to tidy up their patch comments and other documentation.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [MAINTAINERS SUMMIT] The role of AI and LLMs in the kernel process
  2025-08-05 17:23     ` James Bottomley
  2025-08-05 17:43       ` Sasha Levin
@ 2025-08-05 18:16       ` Mark Brown
  1 sibling, 0 replies; 43+ messages in thread
From: Mark Brown @ 2025-08-05 18:16 UTC (permalink / raw)
  To: James Bottomley; +Cc: Lorenzo Stoakes, ksummit

[-- Attachment #1: Type: text/plain, Size: 3286 bytes --]

On Tue, Aug 05, 2025 at 01:23:18PM -0400, James Bottomley wrote:
> On Tue, 2025-08-05 at 18:11 +0100, Mark Brown wrote:

> > Patch backporting sounds pretty scary to me, it's the sort of thing
> > where extra context that needs to be accounted for is very likely to
> > come up (eg, assumptions you can make about existing state or
> > santisation).

> If you think about it, the git history contains the exact patch path
> between where the patch was applied and where you want to apply it. 
> That's a finite data set which LLMs can be trained to work nicely with.

> >   That trips up humans often enough and doesn't seem like it's
> > playing to the strengths advertised for LLMs.

> Humans don't look at the patch path (or use something broad like a
> range scan).  The AI can be patient enough to actually go over it all.

The things humans are usually doing in a situation like that is
remembering that someone changed something and why, and of course the
new dependencies that came in.  I see what you're saying, but I'm rather
nervous as to what people would actually do and how effective the
results would be especially where things get complicated and there's
landmines.

> > TBH I'm not thrilled about the general test code is trivial
> > assumption either,

> I don't think anyone who trains AI thinks testing is trivial.  It does
> take special training for AI to be good at test writing.

I think a lot of the people saying "oh, we can just churn that out with
AI" kind of things do have that sort of attitude.  This thread is far
from the first time I've seen people saying tests are a great
application, and it's more usually as a contrast to the complicated
stuff in the kernel rather than with a consideration of the reasons the
specific benefits these tools might offer in this applciation.

> >  unstable test code or test code that doesn't cover what people think
> > it covers are both problems.

> Test coverage and constructing tests for coverage is another place AI
> can help.  Especially given coverage is a measurable quantity which
> makes training easier.

There's definitely some opportunity for specialist stuff there,
especially if you're just looking at measurable metrics like you're
mentioning.  Other tools in this area are also available of course!

> >   The issues when things go wrong are less severe than the kernel
> > itself but things still need to be maintained and we already have
> > issues with people being dismissive of the selftests.

> Well our selftests, having just spent ages figuring out how to run a
> subset of the bpf tests, are very eccentric ... in that each test set
> runs in a completely different way from any of the others and knowledge
> from one selftest area doesn't apply to a different one.

They should all run from the selftests harness so the simple running
them bit should at least should be standard?  We do have some suites
that were thrown into the kernel with marginal integration with the
frameworks but they're generally fairly obvious as soon as you go in via
the standard interfaces.  I'm not saying the overall picture is amazing,
but I see a big part of it being a social problem with getting people to
take what we've got seriously.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [MAINTAINERS SUMMIT] The role of AI and LLMs in the kernel process
  2025-08-05 18:10 ` H. Peter Anvin
@ 2025-08-05 18:19   ` Lorenzo Stoakes
  2025-08-06  5:49   ` Julia Lawall
  1 sibling, 0 replies; 43+ messages in thread
From: Lorenzo Stoakes @ 2025-08-05 18:19 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: ksummit

On Tue, Aug 05, 2025 at 11:10:51AM -0700, H. Peter Anvin wrote:

>
> It would seem to me to be far more useful  to get AI to help find hard-to-see but perhaps common problems, or tidying up code (like Coccinelle on steroids.)
>
> Things that are easy to see post hoc are correct, but can be hard or laborious to do.

Right, as long as we get the RoI right on the efforts to check the results,
that's a great suggestion.

>
> Another genuinely good usage for AI is for especially non-English speakers to tidy up their patch comments and other documentation.

Yes, I have a lot of empathy for this and I gather non-native speakers find
it extraordinarily useful for this task in general.

This is a very good point as to why we have to be careful not to have a
broad-brush approach like 'no AI', as there are uses that do not cause
issues of concern but are of great use to people using them.

It would be rather egregious to demand that non-native speakers using LLMs
for this purpose disclose the fact that would label the patch in the same
fashion as, say, a very broken LLM kernel patch let's say.

There really has to be a fine-grained approach to this - the devil truly is
in the details here.

Cheers, Lorenzo

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [MAINTAINERS SUMMIT] The role of AI and LLMs in the kernel process
  2025-08-05 17:55   ` Lorenzo Stoakes
@ 2025-08-05 18:23     ` Lorenzo Stoakes
  2025-08-12 13:44       ` Steven Rostedt
  2025-08-05 18:34     ` James Bottomley
  2025-08-05 18:39     ` Sasha Levin
  2 siblings, 1 reply; 43+ messages in thread
From: Lorenzo Stoakes @ 2025-08-05 18:23 UTC (permalink / raw)
  To: James Bottomley; +Cc: ksummit, Steven Rostedt

+cc Steven

Apologies, I meant to +cc you here as I mention an issue you encountered,
and forgot of course :>)

Cheers, Lorenzo

On Tue, Aug 05, 2025 at 06:55:29PM +0100, Lorenzo Stoakes wrote:
> On Tue, Aug 05, 2025 at 12:43:38PM -0400, James Bottomley wrote:
> > On Tue, 2025-08-05 at 17:03 +0100, Lorenzo Stoakes wrote:
> > > Unavoidably, LLMs are the hot topic in tech right now, and are here
> > > to stay.
> > >
> > > This poses unique problems:
> > >
> > > * Never before have people been able to generate as much content that
> > > may, on a surface reading, seem valid whilst in reality being quite
> > > the opposite.
> > >
> > > * Equally, LLM's can introduce very subtle mistakes that humans find
> > > difficult to pick up upon - humans implicitly assume that the classes
> > > of errors they will encounter are the kinds other humans would make -
> > > AI defeats that instinct.
> >
> > Do you have any examples of this?  I've found the opposite to be true:
>
> Sure - Steven encountered this in [1].
>
> As he says there:
>
> "If I had known, I would have examined the patch a little more thoroughly,
>  and would have discovered a very minor mistake in the patch."
>
> The algorithm is determining likely output based on statistics, and
> therefore density of input. Since in reality one can write infinite
> programs, it's mathematically inevitable that an LLM will have to 'infer'
> answers.
>
> That inference has no basis in dynamics, that is a model of reality that it
> can use to determine answers, rather it will, in essence, provide a random
> result.
>
> If there is a great deal of input (e.g. C programs), then that inference is
> likely to manifest in very subtle errors. See [2] for a thoughtful
> exploration from an AI expert on the topic of statistics vs. dynamics, and
> [3] for a broader exploration of the topic from the same author.
>
> [1]:https://lore.kernel.org/workflows/20250724194556.105803db@gandalf.local.home/
> [2]:https://blog.piekniewski.info/2016/11/01/statistics-and-dynamics/
> [3]:https://blog.piekniewski.info/2023/04/09/ai-reflections/
>
>
> > AI is capable of really big stupid mistakes when it hasn't seen enough
> > of the pattern, but I can't recall seeing it make something you'd
> > classify as a subtle mistake (I assume it could copy subtle mistakes
> > from wrong training data, so I'm not saying it can't, just that I
> > haven't seen any).
>
> It's not from incorrect training data, it's fundamental to how LLMs
> work.
>
> >
> > I think the big mistakes could possibly be avoided by asking people who
> > submit patches to also append the AI confidence score:
> >
> > https://learn.microsoft.com/en-us/azure/ai-services/document-intelligence/concept/accuracy-confidence?view=doc-intel-4.0.0
>
> That's interesting, though I don't know how reliable this might be.
>
> However it's for exactly this kind of input that I proposed the topic :)
>
> >
> > So we know how much similar training the model has seen before coming
> > to any conclusion about the value of the output.
> >
> > > * The kernel is uniquely sensitive to erroneous (especially subtly
> > > erroneous) code - even small errors can be highly consequential. We
> > > use a programming language that can almost be defined by its lack of
> > > any kind   of safety, and in some subsystems patches are simply taken
> > > if no obvious problems exist, making us rather vulnerable to this.
> >
> > I think that's really overlooking the fact that if properly trained (a
> > somewhat big *if* depending on the model) AI should be very good at
> > writing safe code in unsafe languages.  However it takes C specific
>
> I fundamentally disagree.
>
> The consequences of even extremely small mistakes can be very serious in C,
> as the language does little to nothing for you.
>
> No matter how much data it absorbs it cannot span the entire space of all
> possible programs or even anywhere close.
>
> I mean again, I apply the arguments above as to why I feel this is
> _fundamental_ to the approach.
>
> Kernel code is also very specific and has characteristics that render it
> different from userland. We must consider a great many more things that
> would be handled for us were we userland - interrupts, the context we are
> in, locks of all varieties, etc. etc.
>
> While there's a lot of kernel code (~10's of millions of line), for an LLM
> that is very small, and we simply cannot generate more.
>
> Yes it can eat up all the C it can, but that isn't quite the same.
>
> > training to do this, so any LLM that's absorbed a load of rust, python
> > and javascript from the internet will be correspondingly bad at writing
> > safe C code.  Hence the origin of the LLM and its training corpus would
> > be a key factor in deciding to trust it.
> >
> > > * On the other hand, there are use cases which are useful - test
> > > data/code generation, summarisation, smart auto-complete - so it'd
> > > perhaps be foolish to entirely dismiss AI.
> >
> > Patch backporting is another such nice use.
>
> As long as carefully checked :)
>
> >
> > > A very important non-technical point we must consider is that, the
> > > second we even appear to be open to AI submission of _any_ kind, the
> > > press will inevitably report on it gleefully, likely with
> > > oversimplified headlines like 'Linux accepts AI patches'.
> >
> > Oh, I think simply accepting AI patches is old news:
> >
> > https://www.cnbc.com/2025/04/29/satya-nadella-says-as-much-as-30percent-of-microsoft-code-is-written-by-ai.html
>
> That doesn't pertain to the kernel specifically.
>
> Of course code being written by AI is old news, but there's no doubt that
> tech publications would JUMP on anything even suggesting we are open in
> some broad way to AI submissions.
>
> Given Linus's rather neutral public position on AI, it'd certainly mark
> what _would be perceived_, in my view, as a sea change on this.
>
> >
> > > The moment that happens, we are likely to see a significant uptick in
> > > AI submissions whether we like it or not.
> > >
> > > I propose that we establish the broad rules as they pertain to the
> > > kernel, and would like to bring the discussion to the Maintainer's
> > > Summit so we can determine what those should be.
> > >
> > > It's important to get a sense of how maintainers feel about this -
> > > whether what is proposed is opt-in or opt-out - and how we actually
> > > implement this.
> > >
> > > There has been discussion on-list about this (see [0]), with many
> > > suggestions made including a 'traffic light' system per-subsystem,
> > > however many open questions remain - the devil is in the details.
> > >
> > > [0]:
> > > https://lore.kernel.org/all/20250727195802.2222764-1-sashal@kernel.or
> > > g/
> >
> > We're already getting AI generated bug reports from what I can tell.
> > It would be really helpful to see the AI confidence score for them as
> > well.
>
> That is definitely an interesting additional data point that could
> potentially be helpful here! I wasn't aware of this so thanks for that :)
>
> >
> > Regards,
> >
> > James
> >
> >
>
> Cheers, Lorenzo

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [MAINTAINERS SUMMIT] The role of AI and LLMs in the kernel process
  2025-08-05 17:55   ` Lorenzo Stoakes
  2025-08-05 18:23     ` Lorenzo Stoakes
@ 2025-08-05 18:34     ` James Bottomley
  2025-08-05 18:55       ` Lorenzo Stoakes
  2025-08-12 13:50       ` Steven Rostedt
  2025-08-05 18:39     ` Sasha Levin
  2 siblings, 2 replies; 43+ messages in thread
From: James Bottomley @ 2025-08-05 18:34 UTC (permalink / raw)
  To: Lorenzo Stoakes; +Cc: ksummit

On Tue, 2025-08-05 at 18:55 +0100, Lorenzo Stoakes wrote:
> On Tue, Aug 05, 2025 at 12:43:38PM -0400, James Bottomley wrote:
> > On Tue, 2025-08-05 at 17:03 +0100, Lorenzo Stoakes wrote:
> > > Unavoidably, LLMs are the hot topic in tech right now, and are
> > > here to stay.
> > > 
> > > This poses unique problems:
> > > 
> > > * Never before have people been able to generate as much content
> > > that may, on a surface reading, seem valid whilst in reality
> > > being quite the opposite.
> > > 
> > > * Equally, LLM's can introduce very subtle mistakes that humans
> > > find difficult to pick up upon - humans implicitly assume that
> > > the classes of errors they will encounter are the kinds other
> > > humans would make - AI defeats that instinct.
> > 
> > Do you have any examples of this?  I've found the opposite to be
> > true:
> 
> Sure - Steven encountered this in [1].
> 
> As he says there:
> 
> "If I had known, I would have examined the patch a little more
> thoroughly,  and would have discovered a very minor mistake in the
> patch."

Heh, well now you make me look it seems that the minor mistake is
adding at tail instead of head?  That seems to be because the hash list
API doesn't have a head add ...

I wouldn't really call that a subtle problem because the LLM would have
picked up the head to tail conversion if we'd had an at head API for it
to learn from.

> The algorithm is determining likely output based on statistics, and
> therefore density of input. Since in reality one can write infinite
> programs, it's mathematically inevitable that an LLM will have to
> 'infer' answers.
> 
> That inference has no basis in dynamics, that is a model of reality
> that it can use to determine answers, rather it will, in essence,
> provide a random result.
> 
> If there is a great deal of input (e.g. C programs), then that
> inference is
> likely to manifest in very subtle errors. See [2] for a thoughtful
> exploration from an AI expert on the topic of statistics vs.
> dynamics, and [3] for a broader exploration of the topic from the
> same author.

Amazingly enough when you're trying to sell a new thing, you become
very down on what you see as the old thing (bcachefs vs btrfs ...?)

> 
> [1]:
> https://lore.kernel.org/workflows/20250724194556.105803db@gandalf.loc
> al.home/
> [2]:https://blog.piekniewski.info/2016/11/01/statistics-and-dynamics/
> [3]:https://blog.piekniewski.info/2023/04/09/ai-reflections/
> 
[...]
> > > * The kernel is uniquely sensitive to erroneous (especially
> > > subtly erroneous) code - even small errors can be highly
> > > consequential. We use a programming language that can almost be
> > > defined by its lack of any kind   of safety, and in some
> > > subsystems patches are simply taken if no obvious problems exist,
> > > making us rather vulnerable to this.
> > 
> > I think that's really overlooking the fact that if properly trained
> > (a somewhat big *if* depending on the model) AI should be very good
> > at writing safe code in unsafe languages.  However it takes C
> > specific
> 
> I fundamentally disagree.
> 
> The consequences of even extremely small mistakes can be very serious
> in C, as the language does little to nothing for you.
> 
> No matter how much data it absorbs it cannot span the entire space of
> all possible programs or even anywhere close.

Neither can a human and we get by on mostly pattern matching ourselves
...

> I mean again, I apply the arguments above as to why I feel this is
> _fundamental_ to the approach.
> 
> Kernel code is also very specific and has characteristics that render
> it different from userland. We must consider a great many more things
> that would be handled for us were we userland - interrupts, the
> context we are in, locks of all varieties, etc. etc.
> 
> While there's a lot of kernel code (~10's of millions of line), for
> an LLM that is very small, and we simply cannot generate more.
> 
> Yes it can eat up all the C it can, but that isn't quite the same.

You seem to be assuming training is simply dump the data corpus and let
the model fend for itself.  It isn't it's a more painstaking process
that finds the mistakes in the output and gets the model to improve
itself ... it is more like human teaching.

I'm not saying current AI is perfect, but I am saying that most of the
issues with current AI can be traced to training problems which can be
corrected in the model if anyone cares enough to do it.  The useful
signal is that in all badly trained models I've seen the AI confidence
score is really low because of the multiple matches in different areas
that proper training would separate.  THat's why I think AI confidence
score should be the first thing we ask for.

Regards,

James


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [MAINTAINERS SUMMIT] The role of AI and LLMs in the kernel process
  2025-08-05 17:55   ` Lorenzo Stoakes
  2025-08-05 18:23     ` Lorenzo Stoakes
  2025-08-05 18:34     ` James Bottomley
@ 2025-08-05 18:39     ` Sasha Levin
  2025-08-05 19:15       ` Lorenzo Stoakes
  2025-08-06  4:04       ` Alexey Dobriyan
  2 siblings, 2 replies; 43+ messages in thread
From: Sasha Levin @ 2025-08-05 18:39 UTC (permalink / raw)
  To: Lorenzo Stoakes; +Cc: James Bottomley, ksummit

On Tue, Aug 05, 2025 at 06:55:29PM +0100, Lorenzo Stoakes wrote:
>On Tue, Aug 05, 2025 at 12:43:38PM -0400, James Bottomley wrote:
>> I think that's really overlooking the fact that if properly trained (a
>> somewhat big *if* depending on the model) AI should be very good at
>> writing safe code in unsafe languages.  However it takes C specific
>
>I fundamentally disagree.
>
>The consequences of even extremely small mistakes can be very serious in C,
>as the language does little to nothing for you.
>
>No matter how much data it absorbs it cannot span the entire space of all
>possible programs or even anywhere close.

Neither can a human :)

I think that this is where we see things differently: I don't think that
AI needs to be perfect, I just want it to be at the same lever (or
better) than a human.

Humans aren't great at writing C code. There's a reason we're looking at
using Rust for the kernel, and there's a reason that LTS trees exist -
they're living evidence of just how many mistakes humans make.

Look at the contents of LTS trees or the CVEs that get assigned: most of
them are fairly simple memory safety issues, off-by-one, use-after-free,
etc...

I don't think we should expect a bar for AI that is higher than the one
we set for humans.

-- 
Thanks,
Sasha

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [MAINTAINERS SUMMIT] The role of AI and LLMs in the kernel process
  2025-08-05 18:01     ` Lorenzo Stoakes
@ 2025-08-05 18:46       ` Mark Brown
  2025-08-05 19:18         ` Lorenzo Stoakes
  0 siblings, 1 reply; 43+ messages in thread
From: Mark Brown @ 2025-08-05 18:46 UTC (permalink / raw)
  To: Lorenzo Stoakes; +Cc: James Bottomley, ksummit

[-- Attachment #1: Type: text/plain, Size: 1185 bytes --]

On Tue, Aug 05, 2025 at 07:01:01PM +0100, Lorenzo Stoakes wrote:

> However the point is that we can:

> - Generate scenarios relatively quickly that way.
> - Resulatant mistakes are at least _less_ impactful. And you'd hope, would come
>   out in the wash more easily.

> However, I wonder whether it might be better simply to restrict LLMs to test
> data generation or the like.

Yeah, test data seems much safer than test code since it's much more
constrained.  I'm definitely not 100% against trying things with code,
I'm more worried about people throwing something out there without
really looking at it properly since it's just test code.  With the
scenario generation for example the question is always if we'd be
happier with the test program being a scenario generator - sometimes the
answer is no (eg, because it's too expensive to run and you really need
to pick scenarios), sometimes the answer is yes even though the
resulting test program is more involved.

> These are again all exactly the kinds of things that make this an important
> topic to discuss I believe :)

Indeed, and I'm going to guess that it's going to be as much about
establishing taste as firm rules.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [MAINTAINERS SUMMIT] The role of AI and LLMs in the kernel process
  2025-08-05 18:34     ` James Bottomley
@ 2025-08-05 18:55       ` Lorenzo Stoakes
  2025-08-12 13:50       ` Steven Rostedt
  1 sibling, 0 replies; 43+ messages in thread
From: Lorenzo Stoakes @ 2025-08-05 18:55 UTC (permalink / raw)
  To: James Bottomley; +Cc: ksummit, Steven Rostedt

(remembering to +cc Steven this time)

On Tue, Aug 05, 2025 at 02:34:40PM -0400, James Bottomley wrote:
> On Tue, 2025-08-05 at 18:55 +0100, Lorenzo Stoakes wrote:
> > On Tue, Aug 05, 2025 at 12:43:38PM -0400, James Bottomley wrote:
> > > On Tue, 2025-08-05 at 17:03 +0100, Lorenzo Stoakes wrote:
> > > > Unavoidably, LLMs are the hot topic in tech right now, and are
> > > > here to stay.
> > > >
> > > > This poses unique problems:
> > > >
> > > > * Never before have people been able to generate as much content
> > > > that may, on a surface reading, seem valid whilst in reality
> > > > being quite the opposite.
> > > >
> > > > * Equally, LLM's can introduce very subtle mistakes that humans
> > > > find difficult to pick up upon - humans implicitly assume that
> > > > the classes of errors they will encounter are the kinds other
> > > > humans would make - AI defeats that instinct.
> > >
> > > Do you have any examples of this?  I've found the opposite to be
> > > true:
> >
> > Sure - Steven encountered this in [1].
> >
> > As he says there:
> >
> > "If I had known, I would have examined the patch a little more
> > thoroughly,  and would have discovered a very minor mistake in the
> > patch."
>
> Heh, well now you make me look it seems that the minor mistake is
> adding at tail instead of head?  That seems to be because the hash list
> API doesn't have a head add ...
>
> I wouldn't really call that a subtle problem because the LLM would have
> picked up the head to tail conversion if we'd had an at head API for it
> to learn from.

You see, I feel like whatever example I provide would provoke a response
like this :)

I also encountered an LLM insisting that MAINTAINERS contained a section
that doesn't exist, but subtly incorrect. 'It' insisted that it was true
and I could check the file (it was wrong).

I've asked for explanations of concepts that it's got confidently,
misleadingly wrong.

https://rdel.substack.com/p/rdel-57-what-are-the-most-common

Is an article referencing common bugs generated by code-generating LLM
machinery.

In interacting with chat bots I've encountered _very confidently_ stated
stuff that is convincing, were you not to be expert enough to determine
otherwise.

I could go and try to gather a bunch of examples (hey, this is a proposal
right? If it were accepted then I'd be able to spend time firming stuff up
like this ;)

But I come back to the fundamental point that we are statistically
inferring information against an infinite number of possibilities. It is
simply mathematically inevitable there will be gaps, and errors can very
conceivably be subtle as well as glaring.

Either are problematic.

>
> > The algorithm is determining likely output based on statistics, and
> > therefore density of input. Since in reality one can write infinite
> > programs, it's mathematically inevitable that an LLM will have to
> > 'infer' answers.
> >
> > That inference has no basis in dynamics, that is a model of reality
> > that it can use to determine answers, rather it will, in essence,
> > provide a random result.
> >
> > If there is a great deal of input (e.g. C programs), then that
> > inference is
> > likely to manifest in very subtle errors. See [2] for a thoughtful
> > exploration from an AI expert on the topic of statistics vs.
> > dynamics, and [3] for a broader exploration of the topic from the
> > same author.
>
> Amazingly enough when you're trying to sell a new thing, you become
> very down on what you see as the old thing (bcachefs vs btrfs ...?)

Come on James, ;) I think this is rather an unfair dismissal of those
articles that are well-reasoned and thoughtful.

I think the discussion around statistical inference vs. dynamic modelling
is fairly profound and insightful.

Also that comparison... ;)

>
> >
> > [1]:
> > https://lore.kernel.org/workflows/20250724194556.105803db@gandalf.loc
> > al.home/
> > [2]:https://blog.piekniewski.info/2016/11/01/statistics-and-dynamics/
> > [3]:https://blog.piekniewski.info/2023/04/09/ai-reflections/
> >
> [...]
> > > > * The kernel is uniquely sensitive to erroneous (especially
> > > > subtly erroneous) code - even small errors can be highly
> > > > consequential. We use a programming language that can almost be
> > > > defined by its lack of any kind   of safety, and in some
> > > > subsystems patches are simply taken if no obvious problems exist,
> > > > making us rather vulnerable to this.
> > >
> > > I think that's really overlooking the fact that if properly trained
> > > (a somewhat big *if* depending on the model) AI should be very good
> > > at writing safe code in unsafe languages.  However it takes C
> > > specific
> >
> > I fundamentally disagree.
> >
> > The consequences of even extremely small mistakes can be very serious
> > in C, as the language does little to nothing for you.
> >
> > No matter how much data it absorbs it cannot span the entire space of
> > all possible programs or even anywhere close.
>
> Neither can a human and we get by on mostly pattern matching ourselves
> ...

This is a very typical counterargument made. The problem is that humans are
not able to generate these kinds of errors at this kind of scale in the
same way LLMs can*, and humans implicitly expect 'human-like' errors, that
we cannot assume will arise in this output.

We tend to have a fairly constrained set of errors that we make, which you
can usually reason about and really - maintainers pattern match on errors
made as much as patch writers pattern match on writing them.

Breaking these assumptions in unusual ways is likely to be problematic.

*Excepting certain coccinelle contributors of course...

>
> > I mean again, I apply the arguments above as to why I feel this is
> > _fundamental_ to the approach.
> >
> > Kernel code is also very specific and has characteristics that render
> > it different from userland. We must consider a great many more things
> > that would be handled for us were we userland - interrupts, the
> > context we are in, locks of all varieties, etc. etc.
> >
> > While there's a lot of kernel code (~10's of millions of line), for
> > an LLM that is very small, and we simply cannot generate more.
> >
> > Yes it can eat up all the C it can, but that isn't quite the same.
>
> You seem to be assuming training is simply dump the data corpus and let
> the model fend for itself.  It isn't it's a more painstaking process
> that finds the mistakes in the output and gets the model to improve
> itself ... it is more like human teaching.

No, I assume that statistical inference cannot be established for an
effectively infinite problem space, which I think in reasonable.

>
> I'm not saying current AI is perfect, but I am saying that most of the
> issues with current AI can be traced to training problems which can be
> corrected in the model if anyone cares enough to do it.  The useful
> signal is that in all badly trained models I've seen the AI confidence
> score is really low because of the multiple matches in different areas
> that proper training would separate.  THat's why I think AI confidence
> score should be the first thing we ask for.

Again, I've no issue with this confidence score as a data point, though we
do need to assess how reliable it is.

>
> Regards,
>
> James
>

I think we're diverging a little from the broader point being made here -
we need a clear policy on this - to details as to what kinds of problems
LLMs pose.

So whether we agree to disagree on some of these details, I feel we can
(probably? :) agree on the need for a coherent approach and a clear policy
on this.

And to be clear, I'm not opposing LLMs per se, I'm simply underlying the
kinds of issues we ought to be cautious of.

Ultimately I think we ought to let individual maintainers decide what they
will/won't accept (within reason).

Cheers, Lorenzo

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [MAINTAINERS SUMMIT] The role of AI and LLMs in the kernel process
  2025-08-05 18:39     ` Sasha Levin
@ 2025-08-05 19:15       ` Lorenzo Stoakes
  2025-08-05 20:02         ` James Bottomley
  2025-08-06  4:04       ` Alexey Dobriyan
  1 sibling, 1 reply; 43+ messages in thread
From: Lorenzo Stoakes @ 2025-08-05 19:15 UTC (permalink / raw)
  To: Sasha Levin; +Cc: James Bottomley, ksummit

On Tue, Aug 05, 2025 at 02:39:20PM -0400, Sasha Levin wrote:
> On Tue, Aug 05, 2025 at 06:55:29PM +0100, Lorenzo Stoakes wrote:
> > On Tue, Aug 05, 2025 at 12:43:38PM -0400, James Bottomley wrote:
> > > I think that's really overlooking the fact that if properly trained (a
> > > somewhat big *if* depending on the model) AI should be very good at
> > > writing safe code in unsafe languages.  However it takes C specific
> >
> > I fundamentally disagree.
> >
> > The consequences of even extremely small mistakes can be very serious in C,
> > as the language does little to nothing for you.
> >
> > No matter how much data it absorbs it cannot span the entire space of all
> > possible programs or even anywhere close.
>
> Neither can a human :)
>
> I think that this is where we see things differently: I don't think that
> AI needs to be perfect, I just want it to be at the same lever (or
> better) than a human.

Not at all, none of my objections are about perfection. I use LLMs myself,
in appropriate circumstances where the expected failings are not
problematic.

My objections are to do with the the kinds of errors one can encounter with
statistical inference like this.

Humans do not confidently hallucinate in the absence of concrete data,
rather we infer and model. This is dynamics vs. statistics (I genuinely
recommend the article I linked to James, it's a fascinating insight - [0]).

It's the _nature_ of these errors that I am concerned about in conjunction
with unsafe development tooling and highly consequential results of even
subtle errors that makes the kernel especially problematic in my view.

>
> Humans aren't great at writing C code. There's a reason we're looking at
> using Rust for the kernel, and there's a reason that LTS trees exist -
> they're living evidence of just how many mistakes humans make.

Humans make human-like errors. and not at industrial scale :)

>
> Look at the contents of LTS trees or the CVEs that get assigned: most of
> them are fairly simple memory safety issues, off-by-one, use-after-free,
> etc...

Absolutely.

>
> I don't think we should expect a bar for AI that is higher than the one
> we set for humans.

I'm not, rather I'm saying let's be aware of the kinds of issues we might
encounter from LLMs and take them into account when establishing policy.

>
> --
> Thanks,
> Sasha

Regardless of where we feel the issues may/may not be, I think we can
certainly agree that we ought to have policy on how we approach
LLM-generated content, and determine broadly what maintainers want to do
with it.

Which I think is the _key_ point of the proposal more than anything! :)

Cheers, Lorenzo

[0]:https://blog.piekniewski.info/2016/11/01/statistics-and-dynamics/

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [MAINTAINERS SUMMIT] The role of AI and LLMs in the kernel process
  2025-08-05 18:46       ` Mark Brown
@ 2025-08-05 19:18         ` Lorenzo Stoakes
  0 siblings, 0 replies; 43+ messages in thread
From: Lorenzo Stoakes @ 2025-08-05 19:18 UTC (permalink / raw)
  To: Mark Brown; +Cc: James Bottomley, ksummit

On Tue, Aug 05, 2025 at 07:46:24PM +0100, Mark Brown wrote:
> On Tue, Aug 05, 2025 at 07:01:01PM +0100, Lorenzo Stoakes wrote:
>
> > However the point is that we can:
>
> > - Generate scenarios relatively quickly that way.
> > - Resulatant mistakes are at least _less_ impactful. And you'd hope, would come
> >   out in the wash more easily.
>
> > However, I wonder whether it might be better simply to restrict LLMs to test
> > data generation or the like.
>
> Yeah, test data seems much safer than test code since it's much more
> constrained.  I'm definitely not 100% against trying things with code,
> I'm more worried about people throwing something out there without
> really looking at it properly since it's just test code.  With the
> scenario generation for example the question is always if we'd be
> happier with the test program being a scenario generator - sometimes the
> answer is no (eg, because it's too expensive to run and you really need
> to pick scenarios), sometimes the answer is yes even though the
> resulting test program is more involved.

And equally, those reviewing test code such as yourself are a finite
resource, and a sudden onset of patches that might result is a
consideration also.

>
> > These are again all exactly the kinds of things that make this an important
> > topic to discuss I believe :)
>
> Indeed, and I'm going to guess that it's going to be as much about
> establishing taste as firm rules.

Absolutely agree :)

Cheers, Lorenzo

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [MAINTAINERS SUMMIT] The role of AI and LLMs in the kernel process
  2025-08-05 19:15       ` Lorenzo Stoakes
@ 2025-08-05 20:02         ` James Bottomley
  2025-08-05 20:48           ` Al Viro
  2025-08-06 19:26           ` Lorenzo Stoakes
  0 siblings, 2 replies; 43+ messages in thread
From: James Bottomley @ 2025-08-05 20:02 UTC (permalink / raw)
  To: Lorenzo Stoakes, Sasha Levin; +Cc: ksummit

On Tue, 2025-08-05 at 20:15 +0100, Lorenzo Stoakes wrote:
> On Tue, Aug 05, 2025 at 02:39:20PM -0400, Sasha Levin wrote:
> > On Tue, Aug 05, 2025 at 06:55:29PM +0100, Lorenzo Stoakes wrote:
> > > On Tue, Aug 05, 2025 at 12:43:38PM -0400, James Bottomley wrote:
> > > > I think that's really overlooking the fact that if properly
> > > > trained (a somewhat big *if* depending on the model) AI should
> > > > be very good at writing safe code in unsafe languages.  However
> > > > it takes C specific
> > > 
> > > I fundamentally disagree.
> > > 
> > > The consequences of even extremely small mistakes can be very
> > > serious in C, as the language does little to nothing for you.
> > > 
> > > No matter how much data it absorbs it cannot span the entire
> > > space of all possible programs or even anywhere close.
> > 
> > Neither can a human :)
> > 
> > I think that this is where we see things differently: I don't think
> > that AI needs to be perfect, I just want it to be at the same lever
> > (or better) than a human.
> 
> Not at all, none of my objections are about perfection. I use LLMs
> myself, in appropriate circumstances where the expected failings are
> not problematic.
> 
> My objections are to do with the the kinds of errors one can
> encounter with statistical inference like this.
> 
> Humans do not confidently hallucinate in the absence of concrete
> data, rather we infer and model.

Might I refer you to pretty much any white house press briefing for
counter examples ...

>  This is dynamics vs. statistics (I genuinely recommend the article I
> linked to James, it's a fascinating insight - [0]).
> 
> It's the _nature_ of these errors that I am concerned about in
> conjunction with unsafe development tooling and highly consequential
> results of even subtle errors that makes the kernel especially
> problematic in my view.

You know that's an argument for not allowing teenagers to learn to
drive (at any age), or over operate heavy machinery or ...

The point being that with enough training human society thinks the
probability of error is remote enough (for some value of enough) to
become an acceptable risk.

> > Humans aren't great at writing C code. There's a reason we're
> > looking at using Rust for the kernel, and there's a reason that LTS
> > trees exist - they're living evidence of just how many mistakes
> > humans make.
> 
> Humans make human-like errors. and not at industrial scale :)

I've an infinite number of monkeys^WSet of Far Eastern Call centres for
appliance repair that would beg to disagree.

> > Look at the contents of LTS trees or the CVEs that get assigned:
> > most of them are fairly simple memory safety issues, off-by-one,
> > use-after-free, etc...
> 
> Absolutely.
> 
> > 
> > I don't think we should expect a bar for AI that is higher than the
> > one we set for humans.
> 
> I'm not, rather I'm saying let's be aware of the kinds of issues we
> might encounter from LLMs and take them into account when
> establishing policy.

Well, if we set a policy, it should be flexible enough to adapt as the
AI does and not be locked to what would prevent the AI mistakes I can
find today from happening.  If we're going to codify this rigidly we
could arguably have a policy not to accept patches from humans who
might be (and often are) wrong as well.

I think we should stick to indicators of trustworthiness that AI is
already generating at let that guide maintainer taste without
necessarily having something more detailed.

Regards,

James


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [MAINTAINERS SUMMIT] The role of AI and LLMs in the kernel process
  2025-08-05 20:02         ` James Bottomley
@ 2025-08-05 20:48           ` Al Viro
  2025-08-06 19:26           ` Lorenzo Stoakes
  1 sibling, 0 replies; 43+ messages in thread
From: Al Viro @ 2025-08-05 20:48 UTC (permalink / raw)
  To: James Bottomley; +Cc: Lorenzo Stoakes, Sasha Levin, ksummit

On Tue, Aug 05, 2025 at 04:02:02PM -0400, James Bottomley wrote:

> > My objections are to do with the the kinds of errors one can
> > encounter with statistical inference like this.
> > 
> > Humans do not confidently hallucinate in the absence of concrete
> > data, rather we infer and model.
> 
> Might I refer you to pretty much any white house press briefing for
> counter examples ...

	Thank you.  Do we really want their kind anywhere near the
codebase, though?  How would you treat someone ever caught pulling that
kind of shit?  Personally I'd classify them as indistinguishable from
scammers and that would be it, reducing the problem to recognizing their
sock puppets if they ever get such.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [MAINTAINERS SUMMIT] The role of AI and LLMs in the kernel process
  2025-08-05 16:43 ` James Bottomley
                     ` (2 preceding siblings ...)
  2025-08-05 17:55   ` Lorenzo Stoakes
@ 2025-08-05 21:58   ` Jiri Kosina
  2025-08-06  6:58     ` Hannes Reinecke
  2025-08-06 19:35     ` Lorenzo Stoakes
  3 siblings, 2 replies; 43+ messages in thread
From: Jiri Kosina @ 2025-08-05 21:58 UTC (permalink / raw)
  To: James Bottomley; +Cc: Lorenzo Stoakes, ksummit

On Tue, 5 Aug 2025, James Bottomley wrote:

> > * Equally, LLM's can introduce very subtle mistakes that humans find
> > difficult to pick up upon - humans implicitly assume that the classes
> > of errors they will encounter are the kinds other humans would make -
> > AI defeats that instinct.
> 
> Do you have any examples of this?  I've found the opposite to be true:
> AI is capable of really big stupid mistakes when it hasn't seen enough
> of the pattern, but I can't recall seeing it make something you'd
> classify as a subtle mistake (I assume it could copy subtle mistakes
> from wrong training data, so I'm not saying it can't, just that I
> haven't seen any).

Few months ago (so state of matters has probably changed a lot since 
then, I have no idea), I've seen a GPIO driver fully written by ChatGPT.

It was functionally correct and worked well. But it was missing error 
handle here-and-there. Is it subtle or big?

Thanks,

-- 
Jiri Kosina
SUSE Labs


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [MAINTAINERS SUMMIT] The role of AI and LLMs in the kernel process
  2025-08-05 18:39     ` Sasha Levin
  2025-08-05 19:15       ` Lorenzo Stoakes
@ 2025-08-06  4:04       ` Alexey Dobriyan
  2025-08-06 20:36         ` Sasha Levin
  1 sibling, 1 reply; 43+ messages in thread
From: Alexey Dobriyan @ 2025-08-06  4:04 UTC (permalink / raw)
  To: Sasha Levin; +Cc: Lorenzo Stoakes, James Bottomley, ksummit

On Tue, Aug 05, 2025 at 02:39:20PM -0400, Sasha Levin wrote:
> I don't think we should expect a bar for AI that is higher than the one
> we set for humans.

We absolutely should.

There is a thing with computers is that they are perfect. They are so
infinitely perfect compared to humans that anything downgrading computers
levels of perfection to humans level of perfection must be rejected
because it can't be scaled back and thus will be permanent loss.

Would you install calculator application which you know reports wrong
results 1% of the time?

Would you uninstall calculator after realizing it has 1% bug rate?

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [MAINTAINERS SUMMIT] The role of AI and LLMs in the kernel process
  2025-08-05 18:10 ` H. Peter Anvin
  2025-08-05 18:19   ` Lorenzo Stoakes
@ 2025-08-06  5:49   ` Julia Lawall
  2025-08-06  9:25     ` Dan Carpenter
  1 sibling, 1 reply; 43+ messages in thread
From: Julia Lawall @ 2025-08-06  5:49 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: Lorenzo Stoakes, ksummit



On Tue, 5 Aug 2025, H. Peter Anvin wrote:

> On August 5, 2025 9:03:18 AM PDT, Lorenzo Stoakes <lorenzo.stoakes@oracle.com> wrote:
> >Unavoidably, LLMs are the hot topic in tech right now, and are here to
> >stay.
> >
> >This poses unique problems:
> >
> >* Never before have people been able to generate as much content that may,
> >  on a surface reading, seem valid whilst in reality being quite the
> >  opposite.
> >
> >* Equally, LLM's can introduce very subtle mistakes that humans find
> >  difficult to pick up upon - humans implicitly assume that the classes of
> >  errors they will encounter are the kinds other humans would make - AI
> >  defeats that instinct.
> >
> >* The kernel is uniquely sensitive to erroneous (especially subtly
> >  erroneous) code - even small errors can be highly consequential. We use a
> >  programming language that can almost be defined by its lack of any kind
> >  of safety, and in some subsystems patches are simply taken if no obvious
> >  problems exist, making us rather vulnerable to this.
> >
> >* On the other hand, there are use cases which are useful - test data/code
> >  generation, summarisation, smart auto-complete - so it'd perhaps be
> >  foolish to entirely dismiss AI.
> >
> >A very important non-technical point we must consider is that, the second
> >we even appear to be open to AI submission of _any_ kind, the press will
> >inevitably report on it gleefully, likely with oversimplified headlines
> >like 'Linux accepts AI patches'.
> >
> >The moment that happens, we are likely to see a significant uptick in AI
> >submissions whether we like it or not.
> >
> >I propose that we establish the broad rules as they pertain to the kernel,
> >and would like to bring the discussion to the Maintainer's Summit so we can
> >determine what those should be.
> >
> >It's important to get a sense of how maintainers feel about this - whether
> >what is proposed is opt-in or opt-out - and how we actually implement this.
> >
> >There has been discussion on-list about this (see [0]), with many
> >suggestions made including a 'traffic light' system per-subsystem, however
> >many open questions remain - the devil is in the details.
> >
> >[0]:https://lore.kernel.org/all/20250727195802.2222764-1-sashal@kernel.org/
> >
>
> It would seem to me to be far more useful  to get AI to help find hard-to-see but perhaps common problems, or tidying up code (like Coccinelle on steroids.)
>
> Things that are easy to see post hoc are correct, but can be hard or laborious to do.
>
> Another genuinely good usage for AI is for especially non-English speakers to tidy up their patch comments and other documentation.

There are also some parts of the kernel that are supposed to have
comments, such as memory barriers.  AI could help ensure that the comments
are actually meaningful, and perhaps suggest to the developer how they
could be improved (sort of a 0-day for comment quality).

julia

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [MAINTAINERS SUMMIT] The role of AI and LLMs in the kernel process
  2025-08-05 21:58   ` Jiri Kosina
@ 2025-08-06  6:58     ` Hannes Reinecke
  2025-08-06 19:36       ` Lorenzo Stoakes
  2025-08-06 19:35     ` Lorenzo Stoakes
  1 sibling, 1 reply; 43+ messages in thread
From: Hannes Reinecke @ 2025-08-06  6:58 UTC (permalink / raw)
  To: Jiri Kosina, James Bottomley; +Cc: Lorenzo Stoakes, ksummit

On 8/5/25 23:58, Jiri Kosina wrote:
> On Tue, 5 Aug 2025, James Bottomley wrote:
> 
>>> * Equally, LLM's can introduce very subtle mistakes that humans find
>>> difficult to pick up upon - humans implicitly assume that the classes
>>> of errors they will encounter are the kinds other humans would make -
>>> AI defeats that instinct.
>>
>> Do you have any examples of this?  I've found the opposite to be true:
>> AI is capable of really big stupid mistakes when it hasn't seen enough
>> of the pattern, but I can't recall seeing it make something you'd
>> classify as a subtle mistake (I assume it could copy subtle mistakes
>> from wrong training data, so I'm not saying it can't, just that I
>> haven't seen any).
> 
> Few months ago (so state of matters has probably changed a lot since
> then, I have no idea), I've seen a GPIO driver fully written by ChatGPT.
> 
> It was functionally correct and worked well. But it was missing error
> handle here-and-there. Is it subtle or big?
> 
Which (curiously enough) raises another point: Procrastination.
We (as developers) are _terrible_ at things which we _should_ do,
but for some reason or other never get around to do properly
or at all. To wit: Error handling. Or: documentation.

Yet the code base we have written will be used as training material
for LLM. So I am not surprised that LLM lack in the very same
areas (how could they otherwise? There is not enough training
material...). And I guess the last thing we want is LLMs inventing
error handling.

But anyway. LLMs are great in producing something where 90% accuracy
is good enough. But that's not how coding works; so one always will
have to review any LLM generated code.
And that at a time where we already have issues finding enough reviewers
for code written by humans.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                  Kernel Storage Architect
hare@suse.com                               +49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [MAINTAINERS SUMMIT] The role of AI and LLMs in the kernel process
  2025-08-06  5:49   ` Julia Lawall
@ 2025-08-06  9:25     ` Dan Carpenter
  2025-08-06  9:39       ` Julia Lawall
  2025-08-06 19:30       ` Lorenzo Stoakes
  0 siblings, 2 replies; 43+ messages in thread
From: Dan Carpenter @ 2025-08-06  9:25 UTC (permalink / raw)
  To: Julia Lawall; +Cc: H. Peter Anvin, Lorenzo Stoakes, ksummit

On Wed, Aug 06, 2025 at 07:49:24AM +0200, Julia Lawall wrote:
> 
> 
> On Tue, 5 Aug 2025, H. Peter Anvin wrote:
> >
> > Another genuinely good usage for AI is for especially non-English
> > speakers to tidy up their patch comments and other documentation.
> 
> There are also some parts of the kernel that are supposed to have
> comments, such as memory barriers.  AI could help ensure that the comments
> are actually meaningful, and perhaps suggest to the developer how they
> could be improved (sort of a 0-day for comment quality).
> 

I feel like I have seen patches where people have generated AI
documentation for locking.  The problem is that if you ask AI to write
something it always has a very confident answer but normally it's vague
and slightly wrong.  It takes no time to generate these patches but it
takes a while to review them.

regards,
dan carpenter

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [MAINTAINERS SUMMIT] The role of AI and LLMs in the kernel process
  2025-08-06  9:25     ` Dan Carpenter
@ 2025-08-06  9:39       ` Julia Lawall
  2025-08-06 19:30       ` Lorenzo Stoakes
  1 sibling, 0 replies; 43+ messages in thread
From: Julia Lawall @ 2025-08-06  9:39 UTC (permalink / raw)
  To: Dan Carpenter; +Cc: H. Peter Anvin, Lorenzo Stoakes, ksummit



On Wed, 6 Aug 2025, Dan Carpenter wrote:

> On Wed, Aug 06, 2025 at 07:49:24AM +0200, Julia Lawall wrote:
> >
> >
> > On Tue, 5 Aug 2025, H. Peter Anvin wrote:
> > >
> > > Another genuinely good usage for AI is for especially non-English
> > > speakers to tidy up their patch comments and other documentation.
> >
> > There are also some parts of the kernel that are supposed to have
> > comments, such as memory barriers.  AI could help ensure that the comments
> > are actually meaningful, and perhaps suggest to the developer how they
> > could be improved (sort of a 0-day for comment quality).
> >
>
> I feel like I have seen patches where people have generated AI
> documentation for locking.  The problem is that if you ask AI to write
> something it always has a very confident answer but normally it's vague
> and slightly wrong.  It takes no time to generate these patches but it
> takes a while to review them.

I would have hoped for some tool that would somehow be designed to
understand comments in a good way.  Random individual efforts to generate
comments using AI would probably indeed produce something innocuous and
useless.

julia

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [MAINTAINERS SUMMIT] The role of AI and LLMs in the kernel process
  2025-08-05 20:02         ` James Bottomley
  2025-08-05 20:48           ` Al Viro
@ 2025-08-06 19:26           ` Lorenzo Stoakes
  2025-08-07 12:25             ` Mark Brown
  1 sibling, 1 reply; 43+ messages in thread
From: Lorenzo Stoakes @ 2025-08-06 19:26 UTC (permalink / raw)
  To: James Bottomley; +Cc: Sasha Levin, ksummit

On Tue, Aug 05, 2025 at 04:02:02PM -0400, James Bottomley wrote:
> > >
> > > I don't think we should expect a bar for AI that is higher than the
> > > one we set for humans.
> >
> > I'm not, rather I'm saying let's be aware of the kinds of issues we
> > might encounter from LLMs and take them into account when
> > establishing policy.
>
> Well, if we set a policy, it should be flexible enough to adapt as the
> AI does and not be locked to what would prevent the AI mistakes I can
> find today from happening.  If we're going to codify this rigidly we
> could arguably have a policy not to accept patches from humans who
> might be (and often are) wrong as well.

Sure, I think any policy should be broad and reasonable.

Probably we want something simple and practical to begin with,
e.g. categorising by:

1. Was most or all of this patch generated by an LLM? (>=90%)
2. Was a large part of this patch generated by an LLM? (>30%)
3. Was a small part of this patch generated by an LLM? (<30%)

In addition to:

- Was the commit message of this patch generated in large part by an LLM
  (excluding non-native speakers using an LLM to simply assist writing it
  in english)?

All of which could have tags, and each entry in MAINTAINERS could have an
opt-in entry indicating which will be acceptable.

We could then explicitly indicate that we're fine with and no need to
disclose uses that are simple day-to-day use of LLM tools such as:

- Simple, supervised use of LLM-based 'smart' autocomplete features.
- Research being assisted by an LLM.
- Any use of an LLM for non-upstreamed code used in development of the
  series.

etc.

Then we can leave the decision as to what's acceptable to individual
maintainers.

>
> I think we should stick to indicators of trustworthiness that AI is
> already generating at let that guide maintainer taste without
> necessarily having something more detailed.

Well, it's an interesting data point but I'm not sure asking the LLM to
rate its own trustworthiness is a reliable measure, and at any rate I think
we need to keep things simple to begin with.

>
> Regards,
>
> James
>

A really key thing to consider here too is maintainer resource. We're
already strained on this with human submissions, so perhaps we want to make
very clear in AI policy document that this is emphatically not an
invitation for pointing automated tools at the kernel and generating tonnes
of patches, and trying to do so might result in your patches being ignored.

Cheers, Lorenzo

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [MAINTAINERS SUMMIT] The role of AI and LLMs in the kernel process
  2025-08-06  9:25     ` Dan Carpenter
  2025-08-06  9:39       ` Julia Lawall
@ 2025-08-06 19:30       ` Lorenzo Stoakes
  2025-08-12 14:37         ` Steven Rostedt
  1 sibling, 1 reply; 43+ messages in thread
From: Lorenzo Stoakes @ 2025-08-06 19:30 UTC (permalink / raw)
  To: Dan Carpenter; +Cc: Julia Lawall, H. Peter Anvin, ksummit

On Wed, Aug 06, 2025 at 12:25:59PM +0300, Dan Carpenter wrote:
> On Wed, Aug 06, 2025 at 07:49:24AM +0200, Julia Lawall wrote:
> >
> >
> > On Tue, 5 Aug 2025, H. Peter Anvin wrote:
> > >
> > > Another genuinely good usage for AI is for especially non-English
> > > speakers to tidy up their patch comments and other documentation.
> >
> > There are also some parts of the kernel that are supposed to have
> > comments, such as memory barriers.  AI could help ensure that the comments
> > are actually meaningful, and perhaps suggest to the developer how they
> > could be improved (sort of a 0-day for comment quality).
> >
>
> I feel like I have seen patches where people have generated AI
> documentation for locking.  The problem is that if you ask AI to write
> something it always has a very confident answer but normally it's vague
> and slightly wrong.  It takes no time to generate these patches but it
> takes a while to review them.

The async relationship between effort to generate vs. effort to review is a
very serious aspect of all this.

It's why I think it's aboslutely key to make all this _opt-in_ and to
empower individual maintainers to decide how to handle these kinds of
patches.

>
> regards,
> dan carpenter

Cheers, Lorenzo

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [MAINTAINERS SUMMIT] The role of AI and LLMs in the kernel process
  2025-08-05 21:58   ` Jiri Kosina
  2025-08-06  6:58     ` Hannes Reinecke
@ 2025-08-06 19:35     ` Lorenzo Stoakes
  1 sibling, 0 replies; 43+ messages in thread
From: Lorenzo Stoakes @ 2025-08-06 19:35 UTC (permalink / raw)
  To: Jiri Kosina; +Cc: James Bottomley, ksummit

On Tue, Aug 05, 2025 at 11:58:19PM +0200, Jiri Kosina wrote:
> On Tue, 5 Aug 2025, James Bottomley wrote:
>
> > > * Equally, LLM's can introduce very subtle mistakes that humans find
> > > difficult to pick up upon - humans implicitly assume that the classes
> > > of errors they will encounter are the kinds other humans would make -
> > > AI defeats that instinct.
> >
> > Do you have any examples of this?  I've found the opposite to be true:
> > AI is capable of really big stupid mistakes when it hasn't seen enough
> > of the pattern, but I can't recall seeing it make something you'd
> > classify as a subtle mistake (I assume it could copy subtle mistakes
> > from wrong training data, so I'm not saying it can't, just that I
> > haven't seen any).
>
> Few months ago (so state of matters has probably changed a lot since
> then, I have no idea), I've seen a GPIO driver fully written by ChatGPT.
>
> It was functionally correct and worked well. But it was missing error
> handle here-and-there. Is it subtle or big?

Right exactly :)

In any case we can all agree that LLMs hallucinate and create errors.

And therefore being able to categorise the degree to which LLMs
participated in a patch easily empowers maintainers to determine how much
effort they're willing to put in to try to track these down.

So either way we need this I think.

Cheers, Lorenzo

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [MAINTAINERS SUMMIT] The role of AI and LLMs in the kernel process
  2025-08-06  6:58     ` Hannes Reinecke
@ 2025-08-06 19:36       ` Lorenzo Stoakes
  0 siblings, 0 replies; 43+ messages in thread
From: Lorenzo Stoakes @ 2025-08-06 19:36 UTC (permalink / raw)
  To: Hannes Reinecke; +Cc: Jiri Kosina, James Bottomley, ksummit

On Wed, Aug 06, 2025 at 08:58:20AM +0200, Hannes Reinecke wrote:
> But anyway. LLMs are great in producing something where 90% accuracy
> is good enough. But that's not how coding works; so one always will
> have to review any LLM generated code.
> And that at a time where we already have issues finding enough reviewers
> for code written by humans.

Yes, precisely. Maintainer resource is strained as it is (he writes at
8.35pm :), we simply do not have the bandwidth to handle an uncontrolled
influx where there's an async relationship between ability-to-generate and
ability-to-review.

So empowering maintainers to _decide_ what's acceptable per-subsystem is
key I think.

Cheers, Lorenzo

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [MAINTAINERS SUMMIT] The role of AI and LLMs in the kernel process
  2025-08-06  4:04       ` Alexey Dobriyan
@ 2025-08-06 20:36         ` Sasha Levin
  0 siblings, 0 replies; 43+ messages in thread
From: Sasha Levin @ 2025-08-06 20:36 UTC (permalink / raw)
  To: Alexey Dobriyan; +Cc: Lorenzo Stoakes, James Bottomley, ksummit

On Wed, Aug 06, 2025 at 07:04:02AM +0300, Alexey Dobriyan wrote:
>On Tue, Aug 05, 2025 at 02:39:20PM -0400, Sasha Levin wrote:
>> I don't think we should expect a bar for AI that is higher than the one
>> we set for humans.
>
>We absolutely should.
>
>There is a thing with computers is that they are perfect. They are so
>infinitely perfect compared to humans that anything downgrading computers
>levels of perfection to humans level of perfection must be rejected
>because it can't be scaled back and thus will be permanent loss.
>
>Would you install calculator application which you know reports wrong
>results 1% of the time?

Let's look at the last release and see :)

$ git log --oneline --no-merges -i --grep ^fixes: --grep cc:.*stable v6.15..v6.16 | wc -l
2734
$ git log --oneline --no-merges v6.15..v6.16 | wc -l
14639

So at least ~20% of the commits that go in are fixes for issues - one
can make the argument that 20% of the kernel is "broken" :)

In your calculator example, if the calculator is making an error 1% of
the time doing simple addition, I would not buy it. If it's making an
error 1% of the time in an edgecase of something I probably would never
use? Sure, I'd happily take the calculator.

>Would you uninstall calculator after realizing it has 1% bug rate?

I'm assuming we both agree that the kernel has over 1% "bug rate"
throughout all of the various usecases it's being used in, why haven't
you uninstalled it yet?

-- 
Thanks,
Sasha

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [MAINTAINERS SUMMIT] The role of AI and LLMs in the kernel process
  2025-08-06 19:26           ` Lorenzo Stoakes
@ 2025-08-07 12:25             ` Mark Brown
  2025-08-07 13:00               ` Lorenzo Stoakes
  0 siblings, 1 reply; 43+ messages in thread
From: Mark Brown @ 2025-08-07 12:25 UTC (permalink / raw)
  To: Lorenzo Stoakes; +Cc: James Bottomley, Sasha Levin, ksummit

[-- Attachment #1: Type: text/plain, Size: 561 bytes --]

On Wed, Aug 06, 2025 at 08:26:41PM +0100, Lorenzo Stoakes wrote:

> - Was the commit message of this patch generated in large part by an LLM
>   (excluding non-native speakers using an LLM to simply assist writing it
>   in english)?

Easiest way to say that is probably to say "written by" and "translated
by".  I think for all these things we should just talk about tooling
rather than specifically LLMs, as well as avoiding any rules lawyering
about how precisely a given tool is implemented it's probably useful to
know about tools whatever the technology.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [MAINTAINERS SUMMIT] The role of AI and LLMs in the kernel process
  2025-08-07 12:25             ` Mark Brown
@ 2025-08-07 13:00               ` Lorenzo Stoakes
  2025-08-11 21:26                 ` Luis Chamberlain
  2025-08-12 14:19                 ` Steven Rostedt
  0 siblings, 2 replies; 43+ messages in thread
From: Lorenzo Stoakes @ 2025-08-07 13:00 UTC (permalink / raw)
  To: Mark Brown; +Cc: James Bottomley, Sasha Levin, ksummit

On Thu, Aug 07, 2025 at 01:25:23PM +0100, Mark Brown wrote:
> On Wed, Aug 06, 2025 at 08:26:41PM +0100, Lorenzo Stoakes wrote:
>
> > - Was the commit message of this patch generated in large part by an LLM
> >   (excluding non-native speakers using an LLM to simply assist writing it
> >   in english)?
>
> Easiest way to say that is probably to say "written by" and "translated
> by".  I think for all these things we should just talk about tooling
> rather than specifically LLMs, as well as avoiding any rules lawyering
> about how precisely a given tool is implemented it's probably useful to
> know about tools whatever the technology.

That's a great idea!!

And agreed on language/rules lawyering, I think we have to have something
_simple_ and robust at least to begin with.

There are inevitably blurry lines everywhere with this stuff so I think we
have to be pragmatic and as clear as we possibly can. We can always refine
things later as we learn.

Cheers, Lorenzo

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [MAINTAINERS SUMMIT] The role of AI and LLMs in the kernel process
  2025-08-07 13:00               ` Lorenzo Stoakes
@ 2025-08-11 21:26                 ` Luis Chamberlain
  2025-08-12 14:19                 ` Steven Rostedt
  1 sibling, 0 replies; 43+ messages in thread
From: Luis Chamberlain @ 2025-08-11 21:26 UTC (permalink / raw)
  To: Lorenzo Stoakes; +Cc: Mark Brown, James Bottomley, Sasha Levin, ksummit

On Thu, Aug 07, 2025 at 02:00:56PM +0100, Lorenzo Stoakes wrote:
> On Thu, Aug 07, 2025 at 01:25:23PM +0100, Mark Brown wrote:
> > On Wed, Aug 06, 2025 at 08:26:41PM +0100, Lorenzo Stoakes wrote:
> >
> > > - Was the commit message of this patch generated in large part by an LLM
> > >   (excluding non-native speakers using an LLM to simply assist writing it
> > >   in english)?
> >
> > Easiest way to say that is probably to say "written by" and "translated
> > by".  I think for all these things we should just talk about tooling
> > rather than specifically LLMs, as well as avoiding any rules lawyering
> > about how precisely a given tool is implemented it's probably useful to
> > know about tools whatever the technology.
> 
> That's a great idea!!
> 
> And agreed on language/rules lawyering, I think we have to have something
> _simple_ and robust at least to begin with.

I've been using for years the tag "Generated-by" starting wit Coccinelle:

git log --oneline --author="mcgrof" --grep "Generated-by"| wc -l
31

And it seems like I'm not the only one:

git log --oneline --grep "Generated-by"| wc -l
49

For other projects such as kdevops where I *only* use LLMs to write new
code now, we have been using:

Generated-by: ChatGPT Codex
Generated-by: Claude AI

We use this even if it was partially AI. I think that gives
maintainers sufficient information to make judgement calls.

Other than this practice, if we're to slowly and carefully welcome LLM
generated code on the kernel I'd recommend we evaluate a context
intialization file.  For Claude that's CLAUDE.md, you can look at
kdevops's file for an example [0]. Having one makes adoption easier, and you
can provide strict rules. The context is limited though, you want about
~40 KiB. However I'm not sure if a generic one may be so easily
agreeable, so fortunatley the bots can also look for your ~/CLAUDE.md.
But I can vouch for the fact that its proven useful for kdevops.

Other than this, another best pratice we've adopted on kdevops is to
to grade commits based on LLM prompts so to keep tabs on how well LLMs improve
overtime with example prompts, and to track them with full prompts, we have
PROMPTS.md [1]. These can help LLMs as well.

The grammatical evolution on kdevops is what makes LLMs adoption today
easily possible [2]. I don't think its as solid yet for kernel development in
agreement with recent findings [3], however its only getting better so
best is we prepare for it and learn from existing projects' use cases.

So for testing -- clearly its a win.  For other things, here's a few
things we can evaluate future success over time:

  * fix syzbot bugs
  * take on maintenance for orphaned drivers
  * help maintainers with patch review / testing

[0] https://github.com/linux-kdevops/kdevops/blob/main/CLAUDE.md
[1] https://github.com/linux-kdevops/kdevops/blob/main/PROMPTS.md
[2] https://github.com/linux-kdevops/kdevops?tab=readme-ov-file#generative-ai-usage-on-kdevops
[3] https://neurips.cc/virtual/2024/poster/97426

  Luis

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [MAINTAINERS SUMMIT] The role of AI and LLMs in the kernel process
  2025-08-05 18:23     ` Lorenzo Stoakes
@ 2025-08-12 13:44       ` Steven Rostedt
  0 siblings, 0 replies; 43+ messages in thread
From: Steven Rostedt @ 2025-08-12 13:44 UTC (permalink / raw)
  To: Lorenzo Stoakes; +Cc: James Bottomley, ksummit

On Tue, 5 Aug 2025 19:23:47 +0100
Lorenzo Stoakes <lorenzo.stoakes@oracle.com> wrote:

> +cc Steven
> 
> Apologies, I meant to +cc you here as I mention an issue you encountered,
> and forgot of course :>)

Just got back from PTO and now catching up through my massive heap of emails.

No worries, I'm subscribed to the ksummit list and already read your email
before I even saw this.

-- Steve

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [MAINTAINERS SUMMIT] The role of AI and LLMs in the kernel process
  2025-08-05 18:34     ` James Bottomley
  2025-08-05 18:55       ` Lorenzo Stoakes
@ 2025-08-12 13:50       ` Steven Rostedt
  1 sibling, 0 replies; 43+ messages in thread
From: Steven Rostedt @ 2025-08-12 13:50 UTC (permalink / raw)
  To: James Bottomley; +Cc: Lorenzo Stoakes, ksummit

On Tue, 05 Aug 2025 14:34:40 -0400
James Bottomley <James.Bottomley@HansenPartnership.com> wrote:

> > Sure - Steven encountered this in [1].
> > 
> > As he says there:
> > 
> > "If I had known, I would have examined the patch a little more
> > thoroughly,  and would have discovered a very minor mistake in the
> > patch."  
> 
> Heh, well now you make me look it seems that the minor mistake is
> adding at tail instead of head?  That seems to be because the hash list
> API doesn't have a head add ...
> 
> I wouldn't really call that a subtle problem because the LLM would have
> picked up the head to tail conversion if we'd had an at head API for it
> to learn from.

That wasn't the minor mistake I was referring to. The mistake was that it
dropped a __read_mostly annotation. Which after further review, was
actually OK and something that should have been done in a separate patch. I
have other tags that are required for alignment and such otherwise things
may break mysteriously. It could easily have dropped a different tag that
would have made a difference.

The point here is that it most definitely will introduce very subtle
mistakes. More subtle than what a human may produce and much harder to spot.

-- Steve

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [MAINTAINERS SUMMIT] The role of AI and LLMs in the kernel process
  2025-08-07 13:00               ` Lorenzo Stoakes
  2025-08-11 21:26                 ` Luis Chamberlain
@ 2025-08-12 14:19                 ` Steven Rostedt
  1 sibling, 0 replies; 43+ messages in thread
From: Steven Rostedt @ 2025-08-12 14:19 UTC (permalink / raw)
  To: Lorenzo Stoakes; +Cc: Mark Brown, James Bottomley, Sasha Levin, ksummit

On Thu, 7 Aug 2025 14:00:56 +0100
Lorenzo Stoakes <lorenzo.stoakes@oracle.com> wrote:

> On Thu, Aug 07, 2025 at 01:25:23PM +0100, Mark Brown wrote:
> > On Wed, Aug 06, 2025 at 08:26:41PM +0100, Lorenzo Stoakes wrote:
> >  
> > > - Was the commit message of this patch generated in large part by an LLM
> > >   (excluding non-native speakers using an LLM to simply assist writing it
> > >   in english)?  
> >
> > Easiest way to say that is probably to say "written by" and "translated
> > by".  I think for all these things we should just talk about tooling
> > rather than specifically LLMs, as well as avoiding any rules lawyering
> > about how precisely a given tool is implemented it's probably useful to
> > know about tools whatever the technology.  
> 
> That's a great idea!!
> 
> And agreed on language/rules lawyering, I think we have to have something
> _simple_ and robust at least to begin with.
> 
> There are inevitably blurry lines everywhere with this stuff so I think we
> have to be pragmatic and as clear as we possibly can. We can always refine
> things later as we learn.

I honestly don't care who or what writes the change log. As long as the
reason for the change is clear and I can read it and then know what the
patch is doing, and more importantly why it is doing it. Then I can be the
judge.

If I don't understand the change log, I simply send the patch back and say,
"Please explain what your are doing and why your are doing it, better".

For the longest time, I have rewritten change logs myself from other people
if they don't explain it properly. Mostly from non-native speakers.
Depending on how much of a rewrite I do, I'll get a confirmation from the
submitter to make sure it's what they intended. If it's just fixing grammar
mistakes I don't even ask.

I use to say "Can you write something like this:" and explain it better,
and the next version of the patch would always have what I suggested
written verbatim, so I stopped doing that :-p

-- Steve


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [MAINTAINERS SUMMIT] The role of AI and LLMs in the kernel process
  2025-08-06 19:30       ` Lorenzo Stoakes
@ 2025-08-12 14:37         ` Steven Rostedt
  2025-08-12 15:02           ` Sasha Levin
  0 siblings, 1 reply; 43+ messages in thread
From: Steven Rostedt @ 2025-08-12 14:37 UTC (permalink / raw)
  To: Lorenzo Stoakes; +Cc: Dan Carpenter, Julia Lawall, H. Peter Anvin, ksummit

On Wed, 6 Aug 2025 20:30:25 +0100
Lorenzo Stoakes <lorenzo.stoakes@oracle.com> wrote:

> > I feel like I have seen patches where people have generated AI
> > documentation for locking.  The problem is that if you ask AI to write
> > something it always has a very confident answer but normally it's vague
> > and slightly wrong.  It takes no time to generate these patches but it
> > takes a while to review them.  
> 
> The async relationship between effort to generate vs. effort to review is a
> very serious aspect of all this.

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
          THIS!

My biggest concern with AI is that it may cause me to spend more time
reviewing what it produced than it would take me to write the code myself.

When I finally got people to help me with my work, my productivity dropped
substantially, as I found I was spending more time helping them with tasks
than it would have taken me to do it myself. But over time, that changed.

This could be the same with AI if you had a model that would learn from
what you taught it. But I suspect that AI generators will not be taking my
input from what I find and I'll likely be telling the same AI generator how
to fix the mistake over and over again.

One thing I look forward to with AI is that it will likely help me debug my
own code. I suspect it would be really good at reading my code and say
"Hey Stupid! You forgot to free this value over here". And things like that.
In other words, I trust AI to debug my code more than I can debug its code.

What's that saying? "To debug code requires someone that is smarter than
the one who wrote it."

-- Steve

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [MAINTAINERS SUMMIT] The role of AI and LLMs in the kernel process
  2025-08-12 14:37         ` Steven Rostedt
@ 2025-08-12 15:02           ` Sasha Levin
  2025-08-12 15:24             ` Paul E. McKenney
  0 siblings, 1 reply; 43+ messages in thread
From: Sasha Levin @ 2025-08-12 15:02 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Lorenzo Stoakes, Dan Carpenter, Julia Lawall, H. Peter Anvin, ksummit

On Tue, Aug 12, 2025 at 10:37:10AM -0400, Steven Rostedt wrote:
>One thing I look forward to with AI is that it will likely help me debug my
>own code. I suspect it would be really good at reading my code and say
>"Hey Stupid! You forgot to free this value over here". And things like that.
>In other words, I trust AI to debug my code more than I can debug its code.

I'm surprised that this isn't too common of a practice at this point.
Sure, AI will make mistakes, but I think that:

1. It's much better at reviewing rather than writing.
2. Mistakes aren't critical (you'd review it yourself anyway).
3. No legal issues.

As an example:

$ git ol 391dda1bd7c56de62b96126214f040fe8965561b
391dda1bd7c5 ("tracing: Use hashtable.h for event_hash")

$ git checkout 391dda1bd7c5^
HEAD is now at 7f81f27b1093 tracing: Fix use-after-free in print_graph_function_flags during tracer switching

$ git format-patch -1 391dda1bd7c5
0001-tracing-Use-hashtable.h-for-event_hash.patch

$ echo "I want to send @0001-tracing-Use-hashtable.h-for-event_hash.patch upstream. Carefully review the patch, identify any issues, and write a report." | claude -p
# Patch Review Report: tracing: Use hashtable.h for event_hash

## Summary
This patch converts the manually implemented hash table in `kernel/trace/trace_output.c` to use the generic hashtable API from `<linux/hashtable.h>`. The conversion is technically correct and simplifies the code, but there is one notable issue that should be addressed.

## Review Findings

### ✅ Positive Aspects

1. **Correct API Usage**: The patch properly uses the hashtable.h API:
    - `DEFINE_HASHTABLE()` for static initialization
    - `hash_add()` for insertion
    - `hash_del()` for deletion
    - `hash_for_each_possible()` for lookup

2. **Proper Constant Renaming**: The rename from `EVENT_HASHSIZE` to `EVENT_HASH_BITS` correctly reflects the semantic change (from size to bit count).

3. **Functionally Equivalent**: The new implementation maintains the same hash table size (2^7 = 128 entries) and hashing behavior.

4. **Code Simplification**: Removes manual bucket calculation logic, making the code cleaner and more maintainable.

### ❌ Issue Found

**Loss of `__read_mostly` Annotation**

The original implementation declares the hash table with `__read_mostly`:
```c
static struct hlist_head event_hash[EVENT_HASHSIZE] __read_mostly;
```

The patch changes this to:
```c
static DEFINE_HASHTABLE(event_hash, EVENT_HASH_BITS);
```

The `DEFINE_HASHTABLE` macro does not preserve the `__read_mostly` annotation. This could impact performance as:
- The event hash table is primarily read during event lookups
- It's only written during event registration/unregistration (rare operations)
- Without `__read_mostly`, the hash table may not be optimally placed in cache-friendly memory sections

### Recommended Fix

The hashtable.h header provides `DEFINE_READ_MOSTLY_HASHTABLE` macro specifically for this use case. The patch should use:
```c
static DEFINE_READ_MOSTLY_HASHTABLE(event_hash, EVENT_HASH_BITS);
```

This would preserve the performance optimization of the original code.

## Verification Checklist

- [x] Patch applies cleanly
- [x] Hash table size remains unchanged (128 entries)
- [x] Hash function behavior is preserved
- [x] Locking semantics remain the same
- [x] All hash table operations correctly converted
- [ ] Performance annotations preserved (`__read_mostly`)

## Recommendation

**Hold for revision** - The patch should be updated to use `DEFINE_READ_MOSTLY_HASHTABLE` instead of `DEFINE_HASHTABLE` to maintain the performance optimization present in the original code.

-- 
Thanks,
Sasha

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [MAINTAINERS SUMMIT] The role of AI and LLMs in the kernel process
  2025-08-12 15:02           ` Sasha Levin
@ 2025-08-12 15:24             ` Paul E. McKenney
  2025-08-12 15:25               ` Sasha Levin
  0 siblings, 1 reply; 43+ messages in thread
From: Paul E. McKenney @ 2025-08-12 15:24 UTC (permalink / raw)
  To: Sasha Levin
  Cc: Steven Rostedt, Lorenzo Stoakes, Dan Carpenter, Julia Lawall,
	H. Peter Anvin, ksummit

On Tue, Aug 12, 2025 at 11:02:50AM -0400, Sasha Levin wrote:
> On Tue, Aug 12, 2025 at 10:37:10AM -0400, Steven Rostedt wrote:
> > One thing I look forward to with AI is that it will likely help me debug my
> > own code. I suspect it would be really good at reading my code and say
> > "Hey Stupid! You forgot to free this value over here". And things like that.
> > In other words, I trust AI to debug my code more than I can debug its code.
> 
> I'm surprised that this isn't too common of a practice at this point.
> Sure, AI will make mistakes, but I think that:
> 
> 1. It's much better at reviewing rather than writing.
> 2. Mistakes aren't critical (you'd review it yourself anyway).
> 3. No legal issues.

Ahem.  Please do us all a favor, yourself included, and go count the
number of lawsuits currently in flight on this topic.

						Thanx, Paul

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [MAINTAINERS SUMMIT] The role of AI and LLMs in the kernel process
  2025-08-12 15:24             ` Paul E. McKenney
@ 2025-08-12 15:25               ` Sasha Levin
  2025-08-12 15:28                 ` Paul E. McKenney
  0 siblings, 1 reply; 43+ messages in thread
From: Sasha Levin @ 2025-08-12 15:25 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Steven Rostedt, Lorenzo Stoakes, Dan Carpenter, Julia Lawall,
	H. Peter Anvin, ksummit

On Tue, Aug 12, 2025 at 08:24:11AM -0700, Paul E. McKenney wrote:
>On Tue, Aug 12, 2025 at 11:02:50AM -0400, Sasha Levin wrote:
>> On Tue, Aug 12, 2025 at 10:37:10AM -0400, Steven Rostedt wrote:
>> > One thing I look forward to with AI is that it will likely help me debug my
>> > own code. I suspect it would be really good at reading my code and say
>> > "Hey Stupid! You forgot to free this value over here". And things like that.
>> > In other words, I trust AI to debug my code more than I can debug its code.
>>
>> I'm surprised that this isn't too common of a practice at this point.
>> Sure, AI will make mistakes, but I think that:
>>
>> 1. It's much better at reviewing rather than writing.
>> 2. Mistakes aren't critical (you'd review it yourself anyway).
>> 3. No legal issues.
>
>Ahem.  Please do us all a favor, yourself included, and go count the
>number of lawsuits currently in flight on this topic.

For using AI for review, where it doesn't generate any code? I'd love a
reference please :)

-- 
Thanks,
Sasha

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [MAINTAINERS SUMMIT] The role of AI and LLMs in the kernel process
  2025-08-12 15:25               ` Sasha Levin
@ 2025-08-12 15:28                 ` Paul E. McKenney
  0 siblings, 0 replies; 43+ messages in thread
From: Paul E. McKenney @ 2025-08-12 15:28 UTC (permalink / raw)
  To: Sasha Levin
  Cc: Steven Rostedt, Lorenzo Stoakes, Dan Carpenter, Julia Lawall,
	H. Peter Anvin, ksummit

On Tue, Aug 12, 2025 at 11:25:49AM -0400, Sasha Levin wrote:
> On Tue, Aug 12, 2025 at 08:24:11AM -0700, Paul E. McKenney wrote:
> > On Tue, Aug 12, 2025 at 11:02:50AM -0400, Sasha Levin wrote:
> > > On Tue, Aug 12, 2025 at 10:37:10AM -0400, Steven Rostedt wrote:
> > > > One thing I look forward to with AI is that it will likely help me debug my
> > > > own code. I suspect it would be really good at reading my code and say
> > > > "Hey Stupid! You forgot to free this value over here". And things like that.
> > > > In other words, I trust AI to debug my code more than I can debug its code.
> > > 
> > > I'm surprised that this isn't too common of a practice at this point.
> > > Sure, AI will make mistakes, but I think that:
> > > 
> > > 1. It's much better at reviewing rather than writing.
> > > 2. Mistakes aren't critical (you'd review it yourself anyway).
> > > 3. No legal issues.
> > 
> > Ahem.  Please do us all a favor, yourself included, and go count the
> > number of lawsuits currently in flight on this topic.
> 
> For using AI for review, where it doesn't generate any code? I'd love a
> reference please :)

Code is not the only thing that is subject to copyright.  But please don't
take my word for it, go ask Napster.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 43+ messages in thread

end of thread, other threads:[~2025-08-12 15:28 UTC | newest]

Thread overview: 43+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-08-05 16:03 [MAINTAINERS SUMMIT] The role of AI and LLMs in the kernel process Lorenzo Stoakes
2025-08-05 16:43 ` James Bottomley
2025-08-05 17:11   ` Mark Brown
2025-08-05 17:23     ` James Bottomley
2025-08-05 17:43       ` Sasha Levin
2025-08-05 17:58         ` Lorenzo Stoakes
2025-08-05 18:16       ` Mark Brown
2025-08-05 18:01     ` Lorenzo Stoakes
2025-08-05 18:46       ` Mark Brown
2025-08-05 19:18         ` Lorenzo Stoakes
2025-08-05 17:17   ` Stephen Hemminger
2025-08-05 17:55   ` Lorenzo Stoakes
2025-08-05 18:23     ` Lorenzo Stoakes
2025-08-12 13:44       ` Steven Rostedt
2025-08-05 18:34     ` James Bottomley
2025-08-05 18:55       ` Lorenzo Stoakes
2025-08-12 13:50       ` Steven Rostedt
2025-08-05 18:39     ` Sasha Levin
2025-08-05 19:15       ` Lorenzo Stoakes
2025-08-05 20:02         ` James Bottomley
2025-08-05 20:48           ` Al Viro
2025-08-06 19:26           ` Lorenzo Stoakes
2025-08-07 12:25             ` Mark Brown
2025-08-07 13:00               ` Lorenzo Stoakes
2025-08-11 21:26                 ` Luis Chamberlain
2025-08-12 14:19                 ` Steven Rostedt
2025-08-06  4:04       ` Alexey Dobriyan
2025-08-06 20:36         ` Sasha Levin
2025-08-05 21:58   ` Jiri Kosina
2025-08-06  6:58     ` Hannes Reinecke
2025-08-06 19:36       ` Lorenzo Stoakes
2025-08-06 19:35     ` Lorenzo Stoakes
2025-08-05 18:10 ` H. Peter Anvin
2025-08-05 18:19   ` Lorenzo Stoakes
2025-08-06  5:49   ` Julia Lawall
2025-08-06  9:25     ` Dan Carpenter
2025-08-06  9:39       ` Julia Lawall
2025-08-06 19:30       ` Lorenzo Stoakes
2025-08-12 14:37         ` Steven Rostedt
2025-08-12 15:02           ` Sasha Levin
2025-08-12 15:24             ` Paul E. McKenney
2025-08-12 15:25               ` Sasha Levin
2025-08-12 15:28                 ` Paul E. McKenney

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox