From: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
To: "Paul E. McKenney" <paulmck@kernel.org>
Cc: James Bottomley <James.Bottomley@hansenpartnership.com>,
Jiri Kosina <jkosina@suse.com>,
ksummit@lists.linux.dev
Subject: Re: [MAINTAINERS SUMMIT] Annotating patches containing AI-assisted code
Date: Thu, 21 Aug 2025 12:23:29 +0200 [thread overview]
Message-ID: <20250821122329.03c77178@foz.lan> (raw)
In-Reply-To: <d565cb60-29bd-4774-995d-0154c0046710@paulmck-laptop>
Em Wed, 20 Aug 2025 14:44:00 -0700
"Paul E. McKenney" <paulmck@kernel.org> escreveu:
> On Tue, Aug 19, 2025 at 06:16:10PM +0200, Mauro Carvalho Chehab wrote:
> > On Tue, Aug 19, 2025 at 04:23:46PM +0100, James Bottomley wrote:
> > > On August 18, 2025 10:07:29 PM GMT+01:00, Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:
> > > >Em Tue, 12 Aug 2025 07:42:21 -0700
> > > >"Paul E. McKenney" <paulmck@kernel.org> escreveu:
> > > [...]
> > > > do agree that many of the lawsuits seem to be motivated by an
> > > >> overwhelmening desire to monetize the output of AI that was induced by
> > > >> someone else's prompts, if that is what you are getting at. It does seem
> > > >> to me personally that after you have sliced and diced the training data,
> > > >> fair use should apply, but last I checked, fair use was a USA-only thing.
> > > >
> > > >Maybe, but other Countries have similar concepts. I remember I saw an
> > > >interpretation of the Brazilian copyright law once from a famous layer
> > > >at property rights matter, stating that reproducing small parts of a book,
> > > >for instance, could be ok, under certain circumstances (in a concept
> > > >similar to US fair use).
> > >
> > > Yes, technically. Article 10 of the Berne convention contains a weaker concept allowing quotations without encumbrance based on a three prong test that the quote isn't extensive, doesn't rob the rights holder of substantial royalties and doesn't unreasonably prejudice the existing copyright rights.
> >
> > Exactly. The interpretation from such speech I mentioned is based on that.
> > Now, exactly what is substantial is something that could be argued.
> >
> > There are two scenarios to consider:
> >
> > 1. AI using public domain or Open Source licensed code;
> >
> > There are so many variations of the same code patterns that AI
> > was trained, that it sounds unlikely that the produced output would
> > contain a substantial amount of the original code.
> >
> > 2. Public AI used to developt closed source
> >
> > If someone from VendorA trains a public AI to develop an IP protected driver
> > for HardwareA with a very specialized unique code, and someone asks the
> > same AI to:
> >
> > "write a driver for HardwareA"
> >
> > and get about the same code, then this would be a possible legal issue.
> >
> > Yet, on such case, the developer from VendorA, by using a public AI,
> > and allowed it to be trained with the code, opened the code to be used
> > elsewhere, eventually violating NDA. For instance, if he used
> > Chatgpt, this license term applies:
> >
> > "3. License to OpenAI
> >
> > When you use the service, you grant OpenAI a license to use
> > your input for the purpose of providing and improving the
> > service—this may include model training unless you’ve opted out.
> >
> > This license is non-exclusive, worldwide, royalty-free,
> > sublicensable—but it's only used as outlined in the Terms of Use
> > and privacy policies."
> >
> > So, if he didn't opt-out, it granted ChatGPT and its users a patent-free
> > sublicensable code.
> >
> > Ok, other LLM tools may have different terms, but if we end having
> > to many people trying to monetize from it, the usage terms will be
> > modified to prevent AI holders to face legal issues.
> >
> > Still, while I'm not a lawyer, my understanding from the (2)
> > is that if one uses it for closed source development and allowed
> > implicitly or explicitly the inputs to be used for training, the one
> > that will be be accounted for, in cases envolving IP leaking, is the
> > person who submitted IP protected property to AI.
>
> Many of the AI players scrape the web, and might well pull in training
> data from web pages having a restrictive copyright. The AI's output
> might then be influenced by that restricted training data.
True, but this is not different than a developer seeking the web for
answers of his development problems, reading textbooks and/or reading
articles.
Also, if someone publicly document something an any sort of media,
it is expected that people will read, adquire knowledge from it and
eventually materialize the acquired knowledge into something. This
is fair use, and has some provision from Berne convention, although
it may depend on each Country's specific laws.
On my view, if the trained data comes from lots of different
places, as AI is actually a stochastic process that write
code by predicting the next code words, if there's just one web
site with an specific pattern, the chances of getting exactly
the same code are pretty low. It is a way more likely that humans
would pick exactly the same code as written on his favorite
textbook than an LLM feed with hundreds of thousands of web
sites.
> Although we
> might desperately want this not to be a problem for AI-based submissions
> to the Linux kernel, what we want and what the various legal systems
> actually give us are not guaranteed to have much relation to each other.
True, but that's not the point. AI is not that different than
someone googling the net to seek for answers.
The only difference is that, when AI is used, you won't know
exactly from where the code was based.
I agree that this could be problematic. But then, again, when a maintainer
picks a patch from someone else, the same applies: we don't have any
guaranties that the code was not just copied-and-pasted from some place,
except by the SoB.
In any case (either AI, human or hybrid AI/human), if the code has issues,
we may need to revert it.
On other words, AI doesn't radically changes it: at the end, all remains
the same.
That's why I don't think we'll get any new information nor need to
follow any procedure different than what we already do, if the developer
had used AI, and to what extent.
-
Now, a completely different thing is if we start having "incompetent"
developers ("incompetent" in the sense given by the Dilbert Principle) that
have some AI bot patch-generator to write patches they can't do themselves.
I'll certainly reject such patches and place such individuals on my
reject list.
Thanks,
Mauro
next prev parent reply other threads:[~2025-08-21 10:23 UTC|newest]
Thread overview: 97+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-08-05 15:38 Jiri Kosina
2025-08-05 17:50 ` Sasha Levin
2025-08-05 18:00 ` Laurent Pinchart
2025-08-05 18:16 ` Sasha Levin
2025-08-05 21:53 ` Jiri Kosina
2025-08-05 22:41 ` Laurent Pinchart
2025-08-05 18:34 ` Lorenzo Stoakes
2025-08-05 22:06 ` Alexandre Belloni
2025-08-05 18:32 ` Lorenzo Stoakes
2025-08-08 8:31 ` Krzysztof Kozlowski
2025-08-11 21:46 ` Paul E. McKenney
2025-08-11 21:57 ` Luck, Tony
2025-08-11 22:12 ` Paul E. McKenney
2025-08-11 22:45 ` H. Peter Anvin
2025-08-11 22:52 ` Paul E. McKenney
2025-08-11 22:54 ` Jonathan Corbet
2025-08-11 23:03 ` Paul E. McKenney
2025-08-12 15:47 ` Steven Rostedt
2025-08-12 16:06 ` Paul E. McKenney
2025-08-11 22:28 ` Sasha Levin
2025-08-12 15:49 ` Steven Rostedt
2025-08-12 16:03 ` Krzysztof Kozlowski
2025-08-12 16:12 ` Paul E. McKenney
2025-08-12 16:17 ` Krzysztof Kozlowski
2025-08-12 17:12 ` Steven Rostedt
2025-08-12 17:39 ` Paul E. McKenney
2025-08-11 22:11 ` Luis Chamberlain
2025-08-11 22:51 ` Paul E. McKenney
2025-08-11 23:22 ` Luis Chamberlain
2025-08-11 23:42 ` Paul E. McKenney
2025-08-12 0:02 ` Luis Chamberlain
2025-08-12 2:49 ` Paul E. McKenney
2025-08-18 21:41 ` Mauro Carvalho Chehab
2025-08-20 21:48 ` Paul E. McKenney
2025-08-12 16:01 ` Steven Rostedt
2025-08-12 16:22 ` Paul E. McKenney
2025-08-18 21:23 ` Mauro Carvalho Chehab
2025-08-19 15:25 ` Paul E. McKenney
2025-08-19 16:27 ` Mauro Carvalho Chehab
2025-08-20 22:03 ` Paul E. McKenney
2025-08-21 10:54 ` Miguel Ojeda
2025-08-21 11:46 ` Mauro Carvalho Chehab
2025-08-12 8:38 ` James Bottomley
2025-08-12 13:15 ` Bird, Tim
2025-08-12 14:31 ` Greg KH
2025-08-18 21:12 ` Mauro Carvalho Chehab
2025-08-19 15:01 ` Paul E. McKenney
2025-08-12 14:42 ` Paul E. McKenney
2025-08-12 15:55 ` Laurent Pinchart
2025-08-18 21:07 ` Mauro Carvalho Chehab
2025-08-19 15:15 ` Paul E. McKenney
2025-08-19 15:23 ` James Bottomley
2025-08-19 16:16 ` Mauro Carvalho Chehab
2025-08-20 21:44 ` Paul E. McKenney
2025-08-21 10:23 ` Mauro Carvalho Chehab [this message]
2025-08-21 16:50 ` Steven Rostedt
2025-08-21 17:30 ` Mauro Carvalho Chehab
2025-08-21 17:36 ` Luck, Tony
2025-08-21 18:01 ` Mauro Carvalho Chehab
2025-08-21 19:03 ` Steven Rostedt
2025-08-21 19:45 ` Mauro Carvalho Chehab
2025-08-21 21:21 ` Paul E. McKenney
2025-08-21 21:32 ` Steven Rostedt
2025-08-21 21:49 ` Paul E. McKenney
2025-08-21 17:53 ` Steven Rostedt
2025-08-21 18:32 ` Mauro Carvalho Chehab
2025-08-21 19:07 ` Steven Rostedt
2025-08-21 19:52 ` Mauro Carvalho Chehab
2025-08-21 21:23 ` Paul E. McKenney
2025-08-22 7:55 ` Geert Uytterhoeven
2025-08-21 20:38 ` Jiri Kosina
2025-08-21 21:18 ` Jiri Kosina
2025-08-21 20:46 ` Paul E. McKenney
2025-08-18 17:53 ` Rafael J. Wysocki
2025-08-18 18:32 ` James Bottomley
2025-08-19 15:14 ` Paul E. McKenney
2025-08-18 19:13 ` Mauro Carvalho Chehab
2025-08-18 19:19 ` Jiri Kosina
2025-08-18 19:44 ` Rafael J. Wysocki
2025-08-18 19:47 ` Jiri Kosina
2025-08-18 22:44 ` Laurent Pinchart
2025-08-06 8:17 ` Dan Carpenter
2025-08-06 10:13 ` Mark Brown
2025-08-12 14:36 ` Ben Dooks
2025-09-15 18:01 ` Kees Cook
2025-09-15 18:29 ` dan.j.williams
2025-09-16 15:36 ` James Bottomley
2025-09-16 9:39 ` Jiri Kosina
2025-09-16 15:31 ` James Bottomley
2025-09-16 14:20 ` Steven Rostedt
2025-09-16 15:00 ` Mauro Carvalho Chehab
2025-09-16 15:48 ` Steven Rostedt
2025-09-16 16:06 ` Luck, Tony
2025-09-16 16:58 ` H. Peter Anvin
2025-09-16 23:30 ` Kees Cook
2025-09-17 15:16 ` Steven Rostedt
2025-09-17 17:02 ` Laurent Pinchart
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250821122329.03c77178@foz.lan \
--to=mchehab+huawei@kernel.org \
--cc=James.Bottomley@hansenpartnership.com \
--cc=jkosina@suse.com \
--cc=ksummit@lists.linux.dev \
--cc=paulmck@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox