Re: [MAINTAINERS SUMMIT] Annotating patches containing AI-assisted code

ksummit.lists.linux.dev archive mirror
 help / color / mirror / Atom feed

From: "Paul E. McKenney" <paulmck@kernel.org>
To: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Cc: Jiri Kosina <jkosina@suse.com>, ksummit@lists.linux.dev
Subject: Re: [MAINTAINERS SUMMIT] Annotating patches containing AI-assisted code
Date: Wed, 20 Aug 2025 15:03:39 -0700	[thread overview]
Message-ID: <eb52bf02-48b6-43fa-93b4-29d827cfcb51@paulmck-laptop> (raw)
In-Reply-To: <wznbwwz2lywki34l5bdl327bpvdzvsmiwzjhdfe5ys7e7puwfy@652l53zffvnl>

On Tue, Aug 19, 2025 at 06:27:20PM +0200, Mauro Carvalho Chehab wrote:
> On Tue, Aug 19, 2025 at 08:25:39AM -0700, Paul E. McKenney wrote:
> > On Mon, Aug 18, 2025 at 11:23:32PM +0200, Mauro Carvalho Chehab wrote:
> > > Em Mon, 11 Aug 2025 15:51:48 -0700
> > > "Paul E. McKenney" <paulmck@kernel.org> escreveu:
> > > 
> > > > On Mon, Aug 11, 2025 at 03:11:47PM -0700, Luis Chamberlain wrote:
> > > > > On Mon, Aug 11, 2025 at 02:46:11PM -0700, Paul E. McKenney wrote:  
> > > > > > depending on how that AI was
> > > > > > trained, those using that AI's output might have some difficulty meeting
> > > > > > the requirements of the second portion of clause (a) of Developer's
> > > > > > Certificate of Origin (DCO) 1.1: "I have the right to submit it under
> > > > > > the open source license indicated in the file".  
> > > > > 
> > > > > If the argument is that cetain LLM generated code cannot be used for code under
> > > > > the DCO, then:
> > > > > 
> > > > > a) isn't this debatable? Do we want to itemize a safe list for AI models
> > > > >    which we think are safe to adopt for AI generated code?  
> > > > 
> > > > For my own work, I will continue to avoid use of AI-generated artifacts
> > > > for open-source software projects unless and until some of the more
> > > > consequential "debates" are resolved favorably.
> > > > 
> > > > > b) seems kind of too late  
> > > > 
> > > > Why?
> > > > 
> > > > > c) If something like the Generated-by tag is used, and we trust it, then
> > > > >    if we do want to side against merging AI generated code, that's perhaps our
> > > > >    only chance at blocking that type of code. Its however not bullet proof.  
> > > > 
> > > > Nothing is bullet proof.  ;-)
> > > 
> > > Let's face reality: before AI generation, more than one time I
> > > received completely identical patches from different developers
> > > with exactly the same content. Sometimes, even the descriptions
> > > were similar. I got one or twice the same description even.
> > 
> > But of course.  And in at least some jurisdictions, one exception to
> > copyright is when there is only one way to express a given concept.
> > 
> > > Granted, those are bug fixes for obvious fixes (usually one liners), but
> > > the point is: there are certain software patterns that are so common 
> > > that there are lots of developers around the globe whose are familiar
> > > with. This is not different from a AI: if one asks it to write a DPS code 
> > > in some language (C, C++, Python, you name it), I bet the code will be
> > > at least 90% similar to any other code you or anyone else would write.
> > > 
> > > The rationale is that we're all trained directly or indirectly
> > > (including AI) with the same textbook algorithms or from someone
> > > that used such textbooks.
> > 
> > That may be true, but we should expect copyright law to continue to be
> > vigorously enforced from time to time.  Yes, I believe that the Linux
> > kernel community is a great group of people, but there is neverthelss
> > no shortage of people who would be happy to take legal action against
> > us if they thought doing so might benefit them.
> > 
> > > I can't see AI making it any better or worse from what we already
> > > have.
> > 
> > My assumption is that any time I ask an AI a question, neither the
> > question nor the answer is in any way private to me.
> 
> If you use a public service: no. If you run AI on ollama, for instance,
> you're running AI locally on your machine, in priciple without access
> to the Internet.
> 
> > In contrast, as
> > far as I know, my own thoughts are private to me. 
> 
> Yes, up to the point you materialize them into something like a patch
> and let others see your work. If you do it on a public ML, it is now
> open to the public to know your ideas.

It is far worse than that.  If I post a patch that I generated with my
own wetware, all people see is the patch itself, along with any public
design documentation that I might have produced along the way.

If I use a public ML, much more data is available, perhaps to bad actors,
on what training data went into producing that patch.  Absent some remote
mind-reading technology, that kind of data is simply not available for
wetware-generated patches.

Please understand that this is a very important difference.

> If one uses AI, his input data can be used to train the next version
> of the model, after some time. So, it may still be closed to the
> main audience for a couple of days/weeks/months (all depends on the
> training policies - and on the AI vendor release windows).
> 
> So, if you don't want ever that other see your code, don't use AI,
> maybe except via a local service like ollama. But, if you're using
> AI to help with open source development, and you won't take too
> much time to publish your work or it doesn't contain any special
> recipe, it is probably ok to use a public AI service.

Again, I am not anywhere near as worried about use of some AI-generated
patch after publication as I am about use of the connection of that
patch to the training data that helped to generate it.

Use of a local service might seem attractive, but even if you somehow
know for sure that it doesn't send your prompts off somewhere, it very
likely at least logs them, for customer-service purposes if nothing else.
Which might be less obviously troubling that broadcasting the prompts
publicly, but any logged prompts are still discoverable in the legal
sense.

Please understand that you are communicating with someone who once had
lawyers come in an photocopy all the paper in his cube and copy out all
the mass storage of all of his devices.  This is not at all theoretical.

> In the middle there are also paywalled AIs where the vendor
> gives some assurances about using (or not) your data for the
> model training.

Assurrances are nice, but ransomware and other attack vectors can
render assurrances meaningless, all of the vendor's good intentions
notwithstanding.

							Thanx, Paul

> > Yes, yes, give or take
> > facial expression, body language, pheromones, and similar, but I do not
> > believe even the best experts are going to deduce my technical innovations
> > from such clues.  Naive of me, perhaps, but that is my firm belief.  ;-)
> > 
> > That difference is highly nontrivial, and could quite possibly make
> > things far worse for us.
> 
> -- 
> Thanks,
> Mauro
>

next prev parent reply	other threads:[~2025-08-20 22:03 UTC|newest]

Thread overview: 97+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-08-05 15:38 Jiri Kosina
2025-08-05 17:50 ` Sasha Levin
2025-08-05 18:00   ` Laurent Pinchart
2025-08-05 18:16     ` Sasha Levin
2025-08-05 21:53       ` Jiri Kosina
2025-08-05 22:41       ` Laurent Pinchart
2025-08-05 18:34     ` Lorenzo Stoakes
2025-08-05 22:06     ` Alexandre Belloni
2025-08-05 18:32   ` Lorenzo Stoakes
2025-08-08  8:31   ` Krzysztof Kozlowski
2025-08-11 21:46     ` Paul E. McKenney
2025-08-11 21:57       ` Luck, Tony
2025-08-11 22:12         ` Paul E. McKenney
2025-08-11 22:45           ` H. Peter Anvin
2025-08-11 22:52             ` Paul E. McKenney
2025-08-11 22:54           ` Jonathan Corbet
2025-08-11 23:03             ` Paul E. McKenney
2025-08-12 15:47               ` Steven Rostedt
2025-08-12 16:06                 ` Paul E. McKenney
2025-08-11 22:28         ` Sasha Levin
2025-08-12 15:49           ` Steven Rostedt
2025-08-12 16:03             ` Krzysztof Kozlowski
2025-08-12 16:12               ` Paul E. McKenney
2025-08-12 16:17                 ` Krzysztof Kozlowski
2025-08-12 17:12                   ` Steven Rostedt
2025-08-12 17:39                     ` Paul E. McKenney
2025-08-11 22:11       ` Luis Chamberlain
2025-08-11 22:51         ` Paul E. McKenney
2025-08-11 23:22           ` Luis Chamberlain
2025-08-11 23:42             ` Paul E. McKenney
2025-08-12  0:02               ` Luis Chamberlain
2025-08-12  2:49                 ` Paul E. McKenney
2025-08-18 21:41             ` Mauro Carvalho Chehab
2025-08-20 21:48               ` Paul E. McKenney
2025-08-12 16:01           ` Steven Rostedt
2025-08-12 16:22             ` Paul E. McKenney
2025-08-18 21:23           ` Mauro Carvalho Chehab
2025-08-19 15:25             ` Paul E. McKenney
2025-08-19 16:27               ` Mauro Carvalho Chehab
2025-08-20 22:03                 ` Paul E. McKenney [this message]
2025-08-21 10:54                   ` Miguel Ojeda
2025-08-21 11:46                     ` Mauro Carvalho Chehab
2025-08-12  8:38       ` James Bottomley
2025-08-12 13:15         ` Bird, Tim
2025-08-12 14:31           ` Greg KH
2025-08-18 21:12           ` Mauro Carvalho Chehab
2025-08-19 15:01             ` Paul E. McKenney
2025-08-12 14:42         ` Paul E. McKenney
2025-08-12 15:55           ` Laurent Pinchart
2025-08-18 21:07           ` Mauro Carvalho Chehab
2025-08-19 15:15             ` Paul E. McKenney
2025-08-19 15:23             ` James Bottomley
2025-08-19 16:16               ` Mauro Carvalho Chehab
2025-08-20 21:44                 ` Paul E. McKenney
2025-08-21 10:23                   ` Mauro Carvalho Chehab
2025-08-21 16:50                     ` Steven Rostedt
2025-08-21 17:30                       ` Mauro Carvalho Chehab
2025-08-21 17:36                         ` Luck, Tony
2025-08-21 18:01                           ` Mauro Carvalho Chehab
2025-08-21 19:03                             ` Steven Rostedt
2025-08-21 19:45                               ` Mauro Carvalho Chehab
2025-08-21 21:21                             ` Paul E. McKenney
2025-08-21 21:32                               ` Steven Rostedt
2025-08-21 21:49                                 ` Paul E. McKenney
2025-08-21 17:53                         ` Steven Rostedt
2025-08-21 18:32                           ` Mauro Carvalho Chehab
2025-08-21 19:07                             ` Steven Rostedt
2025-08-21 19:52                               ` Mauro Carvalho Chehab
2025-08-21 21:23                                 ` Paul E. McKenney
2025-08-22  7:55                         ` Geert Uytterhoeven
2025-08-21 20:38                     ` Jiri Kosina
2025-08-21 21:18                       ` Jiri Kosina
2025-08-21 20:46                     ` Paul E. McKenney
2025-08-18 17:53         ` Rafael J. Wysocki
2025-08-18 18:32           ` James Bottomley
2025-08-19 15:14             ` Paul E. McKenney
2025-08-18 19:13         ` Mauro Carvalho Chehab
2025-08-18 19:19           ` Jiri Kosina
2025-08-18 19:44             ` Rafael J. Wysocki
2025-08-18 19:47               ` Jiri Kosina
2025-08-18 22:44                 ` Laurent Pinchart
2025-08-06  8:17 ` Dan Carpenter
2025-08-06 10:13   ` Mark Brown
2025-08-12 14:36     ` Ben Dooks
2025-09-15 18:01 ` Kees Cook
2025-09-15 18:29   ` dan.j.williams
2025-09-16 15:36     ` James Bottomley
2025-09-16  9:39   ` Jiri Kosina
2025-09-16 15:31     ` James Bottomley
2025-09-16 14:20   ` Steven Rostedt
2025-09-16 15:00     ` Mauro Carvalho Chehab
2025-09-16 15:48       ` Steven Rostedt
2025-09-16 16:06         ` Luck, Tony
2025-09-16 16:58           ` H. Peter Anvin
2025-09-16 23:30     ` Kees Cook
2025-09-17 15:16       ` Steven Rostedt
2025-09-17 17:02       ` Laurent Pinchart

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=eb52bf02-48b6-43fa-93b4-29d827cfcb51@paulmck-laptop \
    --to=paulmck@kernel.org \
    --cc=jkosina@suse.com \
    --cc=ksummit@lists.linux.dev \
    --cc=mchehab+huawei@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox