ksummit.lists.linux.dev archive mirror
 help / color / mirror / Atom feed
From: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
To: Steven Rostedt <rostedt@goodmis.org>
Cc: "Paul E. McKenney" <paulmck@kernel.org>,
	James Bottomley <James.Bottomley@hansenpartnership.com>,
	Jiri Kosina <jkosina@suse.com>,
	ksummit@lists.linux.dev
Subject: Re: [MAINTAINERS SUMMIT] Annotating patches containing AI-assisted code
Date: Thu, 21 Aug 2025 19:30:41 +0200	[thread overview]
Message-ID: <20250821193041.398ed30b@foz.lan> (raw)
In-Reply-To: <20250821125037.5cf5be3d@gandalf.local.home>

Em Thu, 21 Aug 2025 12:50:37 -0400
Steven Rostedt <rostedt@goodmis.org> escreveu:

> On Thu, 21 Aug 2025 12:23:29 +0200
> Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:
> 
> > > Many of the AI players scrape the web, and might well pull in training
> > > data from web pages having a restrictive copyright.  The AI's output
> > > might then be influenced by that restricted training data.     
> > 
> > True, but this is not different than a developer seeking the web for
> > answers of his development problems, reading textbooks and/or reading 
> > articles.  
> 
> The difference I believe is that AI is still a computer program. It could,
> in theory, copy something exactly as is, where copyright does matter.
> 
> If you read something and was able to rewrite it verbatim, you would be
> subject to copyright infringement if what you read had limits on how you
> could reproduce it.

Maybe at the early days of LLM this could be true, but now that they're
massively trained by bots, the number of places it retrieves data for
its training is very large, and considering how artificial neurons
work, they will only store patterns with a high number of repetitions. 

Now, if one asks it to do a web search, then the result can be 
biased, just like if you google it at the web.

> > Also, if someone publicly document something an any sort of media,
> > it is expected that people will read, adquire knowledge from it and
> > eventually materialize the acquired knowledge into something. This
> > is fair use, and has some provision from Berne convention, although
> > it may depend on each Country's specific laws.  
> 
> You can learn from it, but it also comes down to how much you actually copy
> from it.
> 
> > 
> > On my view, if the trained data comes from lots of different
> > places, as AI is actually a stochastic process that write
> > code by predicting the next code words, if there's just one web 
> > site with an specific pattern, the chances of getting exactly
> > the same code are pretty low. It is a way more likely that humans
> > would pick exactly the same code as written on his favorite
> > textbook than an LLM feed with hundreds of thousands of web
> > sites.  
> 
> The issue I have with the above statement is, how would you know if the AI
> copied something verbatim or not? Are you going to ask it? "Hey, AI, was
> this code a direct copy of anything?" Would you trust its answer?
> 
> For a human to do the same, they would have to knowingly have done the copy.

Heh, if I ask you to write a C code to write something...

...
...
...
... 

I bet that one of the first things (if not the first) you
considered was: printf("Hello world!"). 

I also bet you can't remember the first time you saw it.

Ok, this is a very small code, but still there are some patterns
that we learn over time and we keep repeating on our code without
knowing from where they came from, nor remembering if there was
a copyright from where we picked it or not.

In my case, I probably saw my first "Hello world" either on a book
or on some magazine a lot of time ago that was copyrighted by its
authors, but I can't tell you for sure when I first saw it.

Do you remember the first time you saw that, and what copyrights
were there? :-)

Thanks,
Mauro

  reply	other threads:[~2025-08-21 17:30 UTC|newest]

Thread overview: 97+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-08-05 15:38 Jiri Kosina
2025-08-05 17:50 ` Sasha Levin
2025-08-05 18:00   ` Laurent Pinchart
2025-08-05 18:16     ` Sasha Levin
2025-08-05 21:53       ` Jiri Kosina
2025-08-05 22:41       ` Laurent Pinchart
2025-08-05 18:34     ` Lorenzo Stoakes
2025-08-05 22:06     ` Alexandre Belloni
2025-08-05 18:32   ` Lorenzo Stoakes
2025-08-08  8:31   ` Krzysztof Kozlowski
2025-08-11 21:46     ` Paul E. McKenney
2025-08-11 21:57       ` Luck, Tony
2025-08-11 22:12         ` Paul E. McKenney
2025-08-11 22:45           ` H. Peter Anvin
2025-08-11 22:52             ` Paul E. McKenney
2025-08-11 22:54           ` Jonathan Corbet
2025-08-11 23:03             ` Paul E. McKenney
2025-08-12 15:47               ` Steven Rostedt
2025-08-12 16:06                 ` Paul E. McKenney
2025-08-11 22:28         ` Sasha Levin
2025-08-12 15:49           ` Steven Rostedt
2025-08-12 16:03             ` Krzysztof Kozlowski
2025-08-12 16:12               ` Paul E. McKenney
2025-08-12 16:17                 ` Krzysztof Kozlowski
2025-08-12 17:12                   ` Steven Rostedt
2025-08-12 17:39                     ` Paul E. McKenney
2025-08-11 22:11       ` Luis Chamberlain
2025-08-11 22:51         ` Paul E. McKenney
2025-08-11 23:22           ` Luis Chamberlain
2025-08-11 23:42             ` Paul E. McKenney
2025-08-12  0:02               ` Luis Chamberlain
2025-08-12  2:49                 ` Paul E. McKenney
2025-08-18 21:41             ` Mauro Carvalho Chehab
2025-08-20 21:48               ` Paul E. McKenney
2025-08-12 16:01           ` Steven Rostedt
2025-08-12 16:22             ` Paul E. McKenney
2025-08-18 21:23           ` Mauro Carvalho Chehab
2025-08-19 15:25             ` Paul E. McKenney
2025-08-19 16:27               ` Mauro Carvalho Chehab
2025-08-20 22:03                 ` Paul E. McKenney
2025-08-21 10:54                   ` Miguel Ojeda
2025-08-21 11:46                     ` Mauro Carvalho Chehab
2025-08-12  8:38       ` James Bottomley
2025-08-12 13:15         ` Bird, Tim
2025-08-12 14:31           ` Greg KH
2025-08-18 21:12           ` Mauro Carvalho Chehab
2025-08-19 15:01             ` Paul E. McKenney
2025-08-12 14:42         ` Paul E. McKenney
2025-08-12 15:55           ` Laurent Pinchart
2025-08-18 21:07           ` Mauro Carvalho Chehab
2025-08-19 15:15             ` Paul E. McKenney
2025-08-19 15:23             ` James Bottomley
2025-08-19 16:16               ` Mauro Carvalho Chehab
2025-08-20 21:44                 ` Paul E. McKenney
2025-08-21 10:23                   ` Mauro Carvalho Chehab
2025-08-21 16:50                     ` Steven Rostedt
2025-08-21 17:30                       ` Mauro Carvalho Chehab [this message]
2025-08-21 17:36                         ` Luck, Tony
2025-08-21 18:01                           ` Mauro Carvalho Chehab
2025-08-21 19:03                             ` Steven Rostedt
2025-08-21 19:45                               ` Mauro Carvalho Chehab
2025-08-21 21:21                             ` Paul E. McKenney
2025-08-21 21:32                               ` Steven Rostedt
2025-08-21 21:49                                 ` Paul E. McKenney
2025-08-21 17:53                         ` Steven Rostedt
2025-08-21 18:32                           ` Mauro Carvalho Chehab
2025-08-21 19:07                             ` Steven Rostedt
2025-08-21 19:52                               ` Mauro Carvalho Chehab
2025-08-21 21:23                                 ` Paul E. McKenney
2025-08-22  7:55                         ` Geert Uytterhoeven
2025-08-21 20:38                     ` Jiri Kosina
2025-08-21 21:18                       ` Jiri Kosina
2025-08-21 20:46                     ` Paul E. McKenney
2025-08-18 17:53         ` Rafael J. Wysocki
2025-08-18 18:32           ` James Bottomley
2025-08-19 15:14             ` Paul E. McKenney
2025-08-18 19:13         ` Mauro Carvalho Chehab
2025-08-18 19:19           ` Jiri Kosina
2025-08-18 19:44             ` Rafael J. Wysocki
2025-08-18 19:47               ` Jiri Kosina
2025-08-18 22:44                 ` Laurent Pinchart
2025-08-06  8:17 ` Dan Carpenter
2025-08-06 10:13   ` Mark Brown
2025-08-12 14:36     ` Ben Dooks
2025-09-15 18:01 ` Kees Cook
2025-09-15 18:29   ` dan.j.williams
2025-09-16 15:36     ` James Bottomley
2025-09-16  9:39   ` Jiri Kosina
2025-09-16 15:31     ` James Bottomley
2025-09-16 14:20   ` Steven Rostedt
2025-09-16 15:00     ` Mauro Carvalho Chehab
2025-09-16 15:48       ` Steven Rostedt
2025-09-16 16:06         ` Luck, Tony
2025-09-16 16:58           ` H. Peter Anvin
2025-09-16 23:30     ` Kees Cook
2025-09-17 15:16       ` Steven Rostedt
2025-09-17 17:02       ` Laurent Pinchart

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250821193041.398ed30b@foz.lan \
    --to=mchehab+huawei@kernel.org \
    --cc=James.Bottomley@hansenpartnership.com \
    --cc=jkosina@suse.com \
    --cc=ksummit@lists.linux.dev \
    --cc=paulmck@kernel.org \
    --cc=rostedt@goodmis.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox