RE: [MAINTAINERS SUMMIT] Annotating patches containing AI-assisted code

ksummit.lists.linux.dev archive mirror
 help / color / mirror / Atom feed

From: "Bird, Tim" <Tim.Bird@sony.com>
To: James Bottomley <James.Bottomley@HansenPartnership.com>,
	"paulmck@kernel.org" <paulmck@kernel.org>,
	Krzysztof Kozlowski <krzk@kernel.org>
Cc: Sasha Levin <sashal@kernel.org>, Jiri Kosina <jkosina@suse.com>,
	"ksummit@lists.linux.dev" <ksummit@lists.linux.dev>
Subject: RE: [MAINTAINERS SUMMIT] Annotating patches containing AI-assisted code
Date: Tue, 12 Aug 2025 13:15:33 +0000	[thread overview]
Message-ID: <MW5PR13MB56323AC4400A7CC5A1880BE6FD2BA@MW5PR13MB5632.namprd13.prod.outlook.com> (raw)
In-Reply-To: <c0ecacbefa1e93cae4176dc368f2ea63f611f56c.camel@HansenPartnership.com>

> -----Original Message-----
> From: James Bottomley <James.Bottomley@HansenPartnership.com>
> On Mon, 2025-08-11 at 14:46 -0700, Paul E. McKenney wrote:
> > On Fri, Aug 08, 2025 at 10:31:27AM +0200, Krzysztof Kozlowski wrote:
> > > On 05/08/2025 19:50, Sasha Levin wrote:
> > > > On Tue, Aug 05, 2025 at 05:38:36PM +0200, Jiri Kosina wrote:
> > > > > This proposal is pretty much followup/spinoff of the discussion
> > > > > currently happening on LKML in one of the sub-threads of [1].
> > > > >
> > > > > This is not really about legal aspects of AI-generated code and
> > > > > patches, I believe that'd be handled well handled well by LF,
> > > > > DCO, etc.
> > > > >
> > > > > My concern here is more "human to human", as in "if I need to
> > > > > talk to a human that actually does understand the patch deeply
> > > > > enough, in context, etc .. who is that?"
> > > > >
> > > > > I believe we need to at least settle on (and document) the way
> > > > > how to express in patch (meta)data:
> > > > >
> > > > > - this patch has been assisted by LLM $X
> > > > > - the human understanding the generated code is $Y
> > > > >
> > > > > We might just implicitly assume this to be the first person in
> > > > > the S-O-B chain (which I personally don't think works for all
> > > > > scenarios, you can have multiple people working on it, etc),
> > > > > but even in such case I believe this needs to be clearly
> > > > > documented.
> > > >
> > > > The above isn't really an AI problem though.
> > > >
> > > > We already have folks sending "checkpatch fixes" which only make
> > > > code less readable or "syzbot fixes" that shut up the warnings
> > > > but are completely bogus otherwise.
> > > >
> > > > Sure, folks sending "AI fixes" could (will?) be a growing
> > > > problem, but tackling just the AI side of it is addressing one of
> > > > the symptoms, not the underlying issue.
> > >
> > > I think there is a important difference in process and in result
> > > between using existing tools, like coccinelle, sparse or even
> > > checkpatch, and AI-assisted coding.
> > >
> > > For the first you still need to write actual code and since you are
> > > writing it, most likely you will compile it. Even if people fix the
> > > warnings, not the problems, they still at least write the code and
> > > thus this filters at least people who never wrote C.
> > >
> > > With AI you do not have to even write it. It will hallucinate,
> > > create some sort of C code and you just send it. No need to compile
> > > it even!
> >
> > Completely agreed, and furthermore, depending on how that AI was
> > trained, those using that AI's output might have some difficulty
> > meeting the requirements of the second portion of clause (a) of
> > Developer's Certificate of Origin (DCO) 1.1: "I have the right to
> > submit it under the open source license indicated in the file".
> 
> Just on the legality of this.  Under US Law, provided the output isn't
> a derivative work (and all the suits over training data have so far
> failed to prove that it is),

This is indeed so.  I have followed the GitHub copilot litigation
(see https://githubcopilotlitigation.com/case-updates.html), and a few
other cases related to whether AI output violates the copyright of the training
data (that is, is a form of derivative work).  I'm not a lawyer, but the legal
reasoning for judgements passed down so far have been, IMHO, atrocious.
Some claims have been thrown out because the output was not identical
to the training data (even when things like comments from the code in
the training data were copied verbatim into the output).  Companies doing
AI code generation now scrub their outputs to make sure nothing
in the output is identical to material in the training data.  However, I'm not
sure this is enough, and this requirement for identicality (to prove derivative work)
is problematic, when copyright law only requires proof of substantial similarity.

The copilot case is going through appeal now, and I wouldn't bet on which
way the outcome will drop.  It could very well yet result that AI output is deemed
to be derivative work of the training data in some cases.  If that occurs, then even restricting
training data to GPL code wouldn't be a sufficient workaround to enable using the AI output
in the kernel.  And, as has been stated elsewhere, there are no currently no major models restricting
their code training data to permissively licensed code.  This makes it infeasible to use
any of the popular models with a high degree of certainty that the output is legally OK.

No legal pun intended, but I think the jury is still out on this issue, and I think it
would be wise to be EXTREMELY cautious introducing AI-generated code into the kernel.
I personally would not submit something for inclusion into the kernel proper that
was AI-generated.  Generation of tools or tests is, IMO, a different matter and I'm
less concerned about that.

Getting back to the discussion at hand, I believe that annotating that a contribution was
AI-generated (or that AI was involved) will at least give us some assistance to re-review
the code and possibly remove or replace it should the legal status of AI-generated code
become problematic in the future.

There is also value in flagging that additional scrutiny may be warranted
at the time of submission.  So I like the idea in principal.

 -- Tim

> copyright in an AI created piece of code,
> actually doesn't exist because a non human entity can't legally hold
> copyright of a work.  The US copyright office has actually issued this
> opinion (huge 3 volume report):
> 
> https://urldefense.com/v3/__https://www.copyright.gov/ai/__;!!O7_YSHcmd9jp3hj_4dEAcyQ!2VMaxMOBIYDHma42N7zDgm5AoJR9Mu4lT0
> _3G6qm0AjSWcqMDjQa7ydTFdLDYUvDE5d9eJtkwIRAO_Kok3fq0KFnCte1js36oeQ$
> 
> But amazingly enough congress has a more succinct summary:
> 
> https://urldefense.com/v3/__https://www.congress.gov/crs-
> product/LSB10922__;!!O7_YSHcmd9jp3hj_4dEAcyQ!2VMaxMOBIYDHma42N7zDgm5AoJR9Mu4lT0_3G6qm0AjSWcqMDjQa7ydTFdLDYUvDE5
> d9eJtkwIRAO_Kok3fq0KFnCte18GKQTDs$
> 
> But the bottom line is that pure AI generated code is effectively
> uncopyrightable and therefore public domain which means anyone
> definitely has the right to submit it to the kernel under the DCO.
> 
> I imagine this situation might be changed by legislation in the future
> when people want to monetize AI output, but such a change can't be
> retroactive, so for now we're OK legally to accept pure AI code with
> the signoff of the submitter (and whatever AI annotation tags we come
> up with).
> 
> Of course if you take AI output and modify it before submitting, then
> the modifications do have copyright (provided a human made them).
> 
> Regards,
> 
> James
>

next prev parent reply	other threads:[~2025-08-12 13:50 UTC|newest]

Thread overview: 97+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-08-05 15:38 Jiri Kosina
2025-08-05 17:50 ` Sasha Levin
2025-08-05 18:00   ` Laurent Pinchart
2025-08-05 18:16     ` Sasha Levin
2025-08-05 21:53       ` Jiri Kosina
2025-08-05 22:41       ` Laurent Pinchart
2025-08-05 18:34     ` Lorenzo Stoakes
2025-08-05 22:06     ` Alexandre Belloni
2025-08-05 18:32   ` Lorenzo Stoakes
2025-08-08  8:31   ` Krzysztof Kozlowski
2025-08-11 21:46     ` Paul E. McKenney
2025-08-11 21:57       ` Luck, Tony
2025-08-11 22:12         ` Paul E. McKenney
2025-08-11 22:45           ` H. Peter Anvin
2025-08-11 22:52             ` Paul E. McKenney
2025-08-11 22:54           ` Jonathan Corbet
2025-08-11 23:03             ` Paul E. McKenney
2025-08-12 15:47               ` Steven Rostedt
2025-08-12 16:06                 ` Paul E. McKenney
2025-08-11 22:28         ` Sasha Levin
2025-08-12 15:49           ` Steven Rostedt
2025-08-12 16:03             ` Krzysztof Kozlowski
2025-08-12 16:12               ` Paul E. McKenney
2025-08-12 16:17                 ` Krzysztof Kozlowski
2025-08-12 17:12                   ` Steven Rostedt
2025-08-12 17:39                     ` Paul E. McKenney
2025-08-11 22:11       ` Luis Chamberlain
2025-08-11 22:51         ` Paul E. McKenney
2025-08-11 23:22           ` Luis Chamberlain
2025-08-11 23:42             ` Paul E. McKenney
2025-08-12  0:02               ` Luis Chamberlain
2025-08-12  2:49                 ` Paul E. McKenney
2025-08-18 21:41             ` Mauro Carvalho Chehab
2025-08-20 21:48               ` Paul E. McKenney
2025-08-12 16:01           ` Steven Rostedt
2025-08-12 16:22             ` Paul E. McKenney
2025-08-18 21:23           ` Mauro Carvalho Chehab
2025-08-19 15:25             ` Paul E. McKenney
2025-08-19 16:27               ` Mauro Carvalho Chehab
2025-08-20 22:03                 ` Paul E. McKenney
2025-08-21 10:54                   ` Miguel Ojeda
2025-08-21 11:46                     ` Mauro Carvalho Chehab
2025-08-12  8:38       ` James Bottomley
2025-08-12 13:15         ` Bird, Tim [this message]
2025-08-12 14:31           ` Greg KH
2025-08-18 21:12           ` Mauro Carvalho Chehab
2025-08-19 15:01             ` Paul E. McKenney
2025-08-12 14:42         ` Paul E. McKenney
2025-08-12 15:55           ` Laurent Pinchart
2025-08-18 21:07           ` Mauro Carvalho Chehab
2025-08-19 15:15             ` Paul E. McKenney
2025-08-19 15:23             ` James Bottomley
2025-08-19 16:16               ` Mauro Carvalho Chehab
2025-08-20 21:44                 ` Paul E. McKenney
2025-08-21 10:23                   ` Mauro Carvalho Chehab
2025-08-21 16:50                     ` Steven Rostedt
2025-08-21 17:30                       ` Mauro Carvalho Chehab
2025-08-21 17:36                         ` Luck, Tony
2025-08-21 18:01                           ` Mauro Carvalho Chehab
2025-08-21 19:03                             ` Steven Rostedt
2025-08-21 19:45                               ` Mauro Carvalho Chehab
2025-08-21 21:21                             ` Paul E. McKenney
2025-08-21 21:32                               ` Steven Rostedt
2025-08-21 21:49                                 ` Paul E. McKenney
2025-08-21 17:53                         ` Steven Rostedt
2025-08-21 18:32                           ` Mauro Carvalho Chehab
2025-08-21 19:07                             ` Steven Rostedt
2025-08-21 19:52                               ` Mauro Carvalho Chehab
2025-08-21 21:23                                 ` Paul E. McKenney
2025-08-22  7:55                         ` Geert Uytterhoeven
2025-08-21 20:38                     ` Jiri Kosina
2025-08-21 21:18                       ` Jiri Kosina
2025-08-21 20:46                     ` Paul E. McKenney
2025-08-18 17:53         ` Rafael J. Wysocki
2025-08-18 18:32           ` James Bottomley
2025-08-19 15:14             ` Paul E. McKenney
2025-08-18 19:13         ` Mauro Carvalho Chehab
2025-08-18 19:19           ` Jiri Kosina
2025-08-18 19:44             ` Rafael J. Wysocki
2025-08-18 19:47               ` Jiri Kosina
2025-08-18 22:44                 ` Laurent Pinchart
2025-08-06  8:17 ` Dan Carpenter
2025-08-06 10:13   ` Mark Brown
2025-08-12 14:36     ` Ben Dooks
2025-09-15 18:01 ` Kees Cook
2025-09-15 18:29   ` dan.j.williams
2025-09-16 15:36     ` James Bottomley
2025-09-16  9:39   ` Jiri Kosina
2025-09-16 15:31     ` James Bottomley
2025-09-16 14:20   ` Steven Rostedt
2025-09-16 15:00     ` Mauro Carvalho Chehab
2025-09-16 15:48       ` Steven Rostedt
2025-09-16 16:06         ` Luck, Tony
2025-09-16 16:58           ` H. Peter Anvin
2025-09-16 23:30     ` Kees Cook
2025-09-17 15:16       ` Steven Rostedt
2025-09-17 17:02       ` Laurent Pinchart

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=MW5PR13MB56323AC4400A7CC5A1880BE6FD2BA@MW5PR13MB5632.namprd13.prod.outlook.com \
    --to=tim.bird@sony.com \
    --cc=James.Bottomley@HansenPartnership.com \
    --cc=jkosina@suse.com \
    --cc=krzk@kernel.org \
    --cc=ksummit@lists.linux.dev \
    --cc=paulmck@kernel.org \
    --cc=sashal@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox