From: "Paul E. McKenney" <paulmck@kernel.org>
To: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Cc: "Bird, Tim" <Tim.Bird@sony.com>,
James Bottomley <James.Bottomley@hansenpartnership.com>,
Krzysztof Kozlowski <krzk@kernel.org>,
Sasha Levin <sashal@kernel.org>, Jiri Kosina <jkosina@suse.com>,
"ksummit@lists.linux.dev" <ksummit@lists.linux.dev>
Subject: Re: [MAINTAINERS SUMMIT] Annotating patches containing AI-assisted code
Date: Tue, 19 Aug 2025 08:01:43 -0700 [thread overview]
Message-ID: <d2d7e4c3-264e-4b16-a471-3fa36c1225eb@paulmck-laptop> (raw)
In-Reply-To: <20250818231223.063c2f12@foz.lan>
On Mon, Aug 18, 2025 at 11:12:23PM +0200, Mauro Carvalho Chehab wrote:
> Em Tue, 12 Aug 2025 13:15:33 +0000
> "Bird, Tim" <Tim.Bird@sony.com> escreveu:
>
> > > -----Original Message-----
> > > From: James Bottomley <James.Bottomley@HansenPartnership.com>
> > > On Mon, 2025-08-11 at 14:46 -0700, Paul E. McKenney wrote:
> > > > On Fri, Aug 08, 2025 at 10:31:27AM +0200, Krzysztof Kozlowski wrote:
> > > > > On 05/08/2025 19:50, Sasha Levin wrote:
> > > > > > On Tue, Aug 05, 2025 at 05:38:36PM +0200, Jiri Kosina wrote:
> > > > > > > This proposal is pretty much followup/spinoff of the discussion
> > > > > > > currently happening on LKML in one of the sub-threads of [1].
> > > > > > >
> > > > > > > This is not really about legal aspects of AI-generated code and
> > > > > > > patches, I believe that'd be handled well handled well by LF,
> > > > > > > DCO, etc.
> > > > > > >
> > > > > > > My concern here is more "human to human", as in "if I need to
> > > > > > > talk to a human that actually does understand the patch deeply
> > > > > > > enough, in context, etc .. who is that?"
> > > > > > >
> > > > > > > I believe we need to at least settle on (and document) the way
> > > > > > > how to express in patch (meta)data:
> > > > > > >
> > > > > > > - this patch has been assisted by LLM $X
> > > > > > > - the human understanding the generated code is $Y
> > > > > > >
> > > > > > > We might just implicitly assume this to be the first person in
> > > > > > > the S-O-B chain (which I personally don't think works for all
> > > > > > > scenarios, you can have multiple people working on it, etc),
> > > > > > > but even in such case I believe this needs to be clearly
> > > > > > > documented.
> > > > > >
> > > > > > The above isn't really an AI problem though.
> > > > > >
> > > > > > We already have folks sending "checkpatch fixes" which only make
> > > > > > code less readable or "syzbot fixes" that shut up the warnings
> > > > > > but are completely bogus otherwise.
> > > > > >
> > > > > > Sure, folks sending "AI fixes" could (will?) be a growing
> > > > > > problem, but tackling just the AI side of it is addressing one of
> > > > > > the symptoms, not the underlying issue.
> > > > >
> > > > > I think there is a important difference in process and in result
> > > > > between using existing tools, like coccinelle, sparse or even
> > > > > checkpatch, and AI-assisted coding.
> > > > >
> > > > > For the first you still need to write actual code and since you are
> > > > > writing it, most likely you will compile it. Even if people fix the
> > > > > warnings, not the problems, they still at least write the code and
> > > > > thus this filters at least people who never wrote C.
> > > > >
> > > > > With AI you do not have to even write it. It will hallucinate,
> > > > > create some sort of C code and you just send it. No need to compile
> > > > > it even!
> > > >
> > > > Completely agreed, and furthermore, depending on how that AI was
> > > > trained, those using that AI's output might have some difficulty
> > > > meeting the requirements of the second portion of clause (a) of
> > > > Developer's Certificate of Origin (DCO) 1.1: "I have the right to
> > > > submit it under the open source license indicated in the file".
> > >
> > > Just on the legality of this. Under US Law, provided the output isn't
> > > a derivative work (and all the suits over training data have so far
> > > failed to prove that it is),
> >
> > This is indeed so. I have followed the GitHub copilot litigation
> > (see https://githubcopilotlitigation.com/case-updates.html), and a few
> > other cases related to whether AI output violates the copyright of the training
> > data (that is, is a form of derivative work). I'm not a lawyer, but the legal
> > reasoning for judgements passed down so far have been, IMHO, atrocious.
> > Some claims have been thrown out because the output was not identical
> > to the training data (even when things like comments from the code in
> > the training data were copied verbatim into the output). Companies doing
> > AI code generation now scrub their outputs to make sure nothing
> > in the output is identical to material in the training data. However, I'm not
> > sure this is enough, and this requirement for identicality (to prove derivative work)
> > is problematic, when copyright law only requires proof of substantial similarity.
> >
> > The copilot case is going through appeal now, and I wouldn't bet on which
> > way the outcome will drop. It could very well yet result that AI output is deemed
> > to be derivative work of the training data in some cases. If that occurs, then even restricting
> > training data to GPL code wouldn't be a sufficient workaround to enable using the AI output
> > in the kernel. And, as has been stated elsewhere, there are no currently no major models restricting
> > their code training data to permissively licensed code. This makes it infeasible to use
> > any of the popular models with a high degree of certainty that the output is legally OK.
> >
> > No legal pun intended, but I think the jury is still out on this issue, and I think it
> > would be wise to be EXTREMELY cautious introducing AI-generated code into the kernel.
> > I personally would not submit something for inclusion into the kernel proper that
> > was AI-generated. Generation of tools or tests is, IMO, a different matter and I'm
> > less concerned about that.
> >
> > Getting back to the discussion at hand, I believe that annotating that a contribution was
> > AI-generated (or that AI was involved) will at least give us some assistance to re-review
> > the code and possibly remove or replace it should the legal status of AI-generated code
> > become problematic in the future.
>
> Heh, it could produce exactly the opposite effect: anyone that may have
> a code that slightly resembles a patch stating that AI was used could try
> to monetize from such patch merge.
This is one of my concerns as well.
Thanx, Paul
> > There is also value in flagging that additional scrutiny may be warranted
> > at the time of submission. So I like the idea in principal.
>
>
> Thanks,
> Mauro
next prev parent reply other threads:[~2025-08-19 15:01 UTC|newest]
Thread overview: 97+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-08-05 15:38 Jiri Kosina
2025-08-05 17:50 ` Sasha Levin
2025-08-05 18:00 ` Laurent Pinchart
2025-08-05 18:16 ` Sasha Levin
2025-08-05 21:53 ` Jiri Kosina
2025-08-05 22:41 ` Laurent Pinchart
2025-08-05 18:34 ` Lorenzo Stoakes
2025-08-05 22:06 ` Alexandre Belloni
2025-08-05 18:32 ` Lorenzo Stoakes
2025-08-08 8:31 ` Krzysztof Kozlowski
2025-08-11 21:46 ` Paul E. McKenney
2025-08-11 21:57 ` Luck, Tony
2025-08-11 22:12 ` Paul E. McKenney
2025-08-11 22:45 ` H. Peter Anvin
2025-08-11 22:52 ` Paul E. McKenney
2025-08-11 22:54 ` Jonathan Corbet
2025-08-11 23:03 ` Paul E. McKenney
2025-08-12 15:47 ` Steven Rostedt
2025-08-12 16:06 ` Paul E. McKenney
2025-08-11 22:28 ` Sasha Levin
2025-08-12 15:49 ` Steven Rostedt
2025-08-12 16:03 ` Krzysztof Kozlowski
2025-08-12 16:12 ` Paul E. McKenney
2025-08-12 16:17 ` Krzysztof Kozlowski
2025-08-12 17:12 ` Steven Rostedt
2025-08-12 17:39 ` Paul E. McKenney
2025-08-11 22:11 ` Luis Chamberlain
2025-08-11 22:51 ` Paul E. McKenney
2025-08-11 23:22 ` Luis Chamberlain
2025-08-11 23:42 ` Paul E. McKenney
2025-08-12 0:02 ` Luis Chamberlain
2025-08-12 2:49 ` Paul E. McKenney
2025-08-18 21:41 ` Mauro Carvalho Chehab
2025-08-20 21:48 ` Paul E. McKenney
2025-08-12 16:01 ` Steven Rostedt
2025-08-12 16:22 ` Paul E. McKenney
2025-08-18 21:23 ` Mauro Carvalho Chehab
2025-08-19 15:25 ` Paul E. McKenney
2025-08-19 16:27 ` Mauro Carvalho Chehab
2025-08-20 22:03 ` Paul E. McKenney
2025-08-21 10:54 ` Miguel Ojeda
2025-08-21 11:46 ` Mauro Carvalho Chehab
2025-08-12 8:38 ` James Bottomley
2025-08-12 13:15 ` Bird, Tim
2025-08-12 14:31 ` Greg KH
2025-08-18 21:12 ` Mauro Carvalho Chehab
2025-08-19 15:01 ` Paul E. McKenney [this message]
2025-08-12 14:42 ` Paul E. McKenney
2025-08-12 15:55 ` Laurent Pinchart
2025-08-18 21:07 ` Mauro Carvalho Chehab
2025-08-19 15:15 ` Paul E. McKenney
2025-08-19 15:23 ` James Bottomley
2025-08-19 16:16 ` Mauro Carvalho Chehab
2025-08-20 21:44 ` Paul E. McKenney
2025-08-21 10:23 ` Mauro Carvalho Chehab
2025-08-21 16:50 ` Steven Rostedt
2025-08-21 17:30 ` Mauro Carvalho Chehab
2025-08-21 17:36 ` Luck, Tony
2025-08-21 18:01 ` Mauro Carvalho Chehab
2025-08-21 19:03 ` Steven Rostedt
2025-08-21 19:45 ` Mauro Carvalho Chehab
2025-08-21 21:21 ` Paul E. McKenney
2025-08-21 21:32 ` Steven Rostedt
2025-08-21 21:49 ` Paul E. McKenney
2025-08-21 17:53 ` Steven Rostedt
2025-08-21 18:32 ` Mauro Carvalho Chehab
2025-08-21 19:07 ` Steven Rostedt
2025-08-21 19:52 ` Mauro Carvalho Chehab
2025-08-21 21:23 ` Paul E. McKenney
2025-08-22 7:55 ` Geert Uytterhoeven
2025-08-21 20:38 ` Jiri Kosina
2025-08-21 21:18 ` Jiri Kosina
2025-08-21 20:46 ` Paul E. McKenney
2025-08-18 17:53 ` Rafael J. Wysocki
2025-08-18 18:32 ` James Bottomley
2025-08-19 15:14 ` Paul E. McKenney
2025-08-18 19:13 ` Mauro Carvalho Chehab
2025-08-18 19:19 ` Jiri Kosina
2025-08-18 19:44 ` Rafael J. Wysocki
2025-08-18 19:47 ` Jiri Kosina
2025-08-18 22:44 ` Laurent Pinchart
2025-08-06 8:17 ` Dan Carpenter
2025-08-06 10:13 ` Mark Brown
2025-08-12 14:36 ` Ben Dooks
2025-09-15 18:01 ` Kees Cook
2025-09-15 18:29 ` dan.j.williams
2025-09-16 15:36 ` James Bottomley
2025-09-16 9:39 ` Jiri Kosina
2025-09-16 15:31 ` James Bottomley
2025-09-16 14:20 ` Steven Rostedt
2025-09-16 15:00 ` Mauro Carvalho Chehab
2025-09-16 15:48 ` Steven Rostedt
2025-09-16 16:06 ` Luck, Tony
2025-09-16 16:58 ` H. Peter Anvin
2025-09-16 23:30 ` Kees Cook
2025-09-17 15:16 ` Steven Rostedt
2025-09-17 17:02 ` Laurent Pinchart
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=d2d7e4c3-264e-4b16-a471-3fa36c1225eb@paulmck-laptop \
--to=paulmck@kernel.org \
--cc=James.Bottomley@hansenpartnership.com \
--cc=Tim.Bird@sony.com \
--cc=jkosina@suse.com \
--cc=krzk@kernel.org \
--cc=ksummit@lists.linux.dev \
--cc=mchehab+huawei@kernel.org \
--cc=sashal@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox