On Tue, Aug 05, 2025 at 01:23:18PM -0400, James Bottomley wrote:
> On Tue, 2025-08-05 at 18:11 +0100, Mark Brown wrote:

> > Patch backporting sounds pretty scary to me, it's the sort of thing
> > where extra context that needs to be accounted for is very likely to
> > come up (eg, assumptions you can make about existing state or
> > santisation).

> If you think about it, the git history contains the exact patch path
> between where the patch was applied and where you want to apply it. 
> That's a finite data set which LLMs can be trained to work nicely with.

> >   That trips up humans often enough and doesn't seem like it's
> > playing to the strengths advertised for LLMs.

> Humans don't look at the patch path (or use something broad like a
> range scan).  The AI can be patient enough to actually go over it all.

The things humans are usually doing in a situation like that is
remembering that someone changed something and why, and of course the
new dependencies that came in.  I see what you're saying, but I'm rather
nervous as to what people would actually do and how effective the
results would be especially where things get complicated and there's
landmines.

> > TBH I'm not thrilled about the general test code is trivial
> > assumption either,

> I don't think anyone who trains AI thinks testing is trivial.  It does
> take special training for AI to be good at test writing.

I think a lot of the people saying "oh, we can just churn that out with
AI" kind of things do have that sort of attitude.  This thread is far
from the first time I've seen people saying tests are a great
application, and it's more usually as a contrast to the complicated
stuff in the kernel rather than with a consideration of the reasons the
specific benefits these tools might offer in this applciation.

> >  unstable test code or test code that doesn't cover what people think
> > it covers are both problems.

> Test coverage and constructing tests for coverage is another place AI
> can help.  Especially given coverage is a measurable quantity which
> makes training easier.

There's definitely some opportunity for specialist stuff there,
especially if you're just looking at measurable metrics like you're
mentioning.  Other tools in this area are also available of course!

> >   The issues when things go wrong are less severe than the kernel
> > itself but things still need to be maintained and we already have
> > issues with people being dismissive of the selftests.

> Well our selftests, having just spent ages figuring out how to run a
> subset of the bpf tests, are very eccentric ... in that each test set
> runs in a completely different way from any of the others and knowledge
> from one selftest area doesn't apply to a different one.

They should all run from the selftests harness so the simple running
them bit should at least should be standard?  We do have some suites
that were thrown into the kernel with marginal integration with the
frameworks but they're generally fairly obvious as soon as you go in via
the standard interfaces.  I'm not saying the overall picture is amazing,
but I see a big part of it being a social problem with getting people to
take what we've got seriously.