Hi everyone,

Depending on how you look at things, this is potentially a topic for
either MS or KS.

One way to lower the load on maintainers is to make it easier for
contributors to send higher quality patches, and to catch errors before
they land in various git trees.

Along those lines, when the AI code submission thread started over the
summer, I decided to see if it was possible to get reasonable code
reviews out of AI.

There are certainly false positives, but Alexei and the BPF developers
wired up my prompts into the BPF CI, and you can find the results in
their github CI.  Everything in red is a bug the AI review found:

https://github.com/kernel-patches/bpf/actions/workflows/ai-code-review.yml

My goal for KS/MS is to discuss how to enable maintainers to use review
automation tools to lower their workload.  I don't want to build new CI
here, so the goal would be enabling integration with existing CI.

My question for everyone is what would it take to make all of this
useful?  I'm working on funding for API access, so hopefully that part
won't be a problem.

There's definitely overlap between the bugs I'm finding and the bugs Dan
Carpenter finds, so I'm hoping he and I can team up as well.

In terms of actual review details, the reviews have two parts:

1) The review prompts.  These are stand alone and can just work on any
kernel tree.  This is what BPF CI is currently using:

https://github.com/masoncl/review-prompts/

These prompts can also debug oopsen or syzbot reports (with varying
success).

2) A code indexing tool with MCP server that Claude can use to find
functions, types, and call chains more effectively.  This makes it more
likely Claude can trace complex relationships in the code:

https://github.com/facebookexperimental/semcode

Asking claude to produce a callgraph for btrfs_search_slot() consumes
~444K tokens.  With semcode installed, the same query produces better
results and uses 25K tokens. (btrfs_search_slot() has a huge callchain)

I don't think BPF CI is using this yet, but we'll move to it and compare
the review results if not.

The reviews are meant to look like emails on lkml, and even when wildly
wrong they definitely succeed there.  I've attached the results of a run
against 600 random commits in linux-next, and the last 400 commits of
net-next (on Oct 2nd).

There are both real bugs and false positives in there, so it gives a
good idea of the mixture of right and wrong that are common in the reviews.

-chris