Re: [MAINTAINERS SUMMIT] The role of AI and LLMs in the kernel process

ksummit.lists.linux.dev archive mirror
 help / color / mirror / Atom feed

From: James Bottomley <James.Bottomley@HansenPartnership.com>
To: Steven Rostedt <rostedt@goodmis.org>, Jonathan Corbet <corbet@lwn.net>
Cc: "H. Peter Anvin" <hpa@zytor.com>, Sasha Levin <sashal@kernel.org>,
	 ksummit@lists.linux.dev
Subject: Re: [MAINTAINERS SUMMIT] The role of AI and LLMs in the kernel process
Date: Mon, 08 Dec 2025 12:42:32 +0900	[thread overview]
Message-ID: <88091c9ac1d8f20bade177212445a60c752ba8b5.camel@HansenPartnership.com> (raw)
In-Reply-To: <20251207221532.4d8747f5@debian>

On Sun, 2025-12-07 at 22:15 -0500, Steven Rostedt wrote:
> On Sun, 07 Dec 2025 18:59:19 -0700
> Jonathan Corbet <corbet@lwn.net> wrote:
> 
> > > I contend there is a huge difference between *code* and
> > > descriptions/documentation/...
> 
> > 
> > As you might imagine, I'm not fully on board with that.  Code is
> > assumed plagiarized, but text is not?  Subtly wrong documentation
> > is OK?
> > 
> > I think our documentation requires just as much care as our code
> > does.
> 
> I assumed what hpa was mentioning about documentation, may be either
> translation of original text of the submitter, or AI looking at the
> code that was created and created a change log. In either case, the
> text was generated from the input of the author

I think this is precisely the problem Jon was referring to: you're
saying that if AI generates *text* based on input prompts it's not a
copyright problem, but if AI generates *code* based on input prompts,
it is.  As simply a neural net operational issue *both* input to output
sets are generated in the same way by the AI process and would have the
same legal probability of being copyright problems.  i.e. if the first
likely isn't a copyright problem, the second likely isn't as well (and
vice versa).

> . Where as AI generated code likely comes from somebody else's code.
> Perhaps AI was trained on somebody else's text, but the output will
> likely not be a derivative of it as the input is still original.

That's an incorrect statement: if the output is a derivative of the
training (which is a big if given the current state of the legal
landscape) and the training set was copyrighted, then even a translated
text using that training data will pick up the copyright violation
regardless of input prompting.

Regards,

James

next prev parent reply	other threads:[~2025-12-08  3:42 UTC|newest]

Thread overview: 56+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-08-05 16:03 Lorenzo Stoakes
2025-08-05 16:43 ` James Bottomley
2025-08-05 17:11   ` Mark Brown
2025-08-05 17:23     ` James Bottomley
2025-08-05 17:43       ` Sasha Levin
2025-08-05 17:58         ` Lorenzo Stoakes
2025-08-05 18:16       ` Mark Brown
2025-08-05 18:01     ` Lorenzo Stoakes
2025-08-05 18:46       ` Mark Brown
2025-08-05 19:18         ` Lorenzo Stoakes
2025-08-05 17:17   ` Stephen Hemminger
2025-08-05 17:55   ` Lorenzo Stoakes
2025-08-05 18:23     ` Lorenzo Stoakes
2025-08-12 13:44       ` Steven Rostedt
2025-08-05 18:34     ` James Bottomley
2025-08-05 18:55       ` Lorenzo Stoakes
2025-08-12 13:50       ` Steven Rostedt
2025-08-05 18:39     ` Sasha Levin
2025-08-05 19:15       ` Lorenzo Stoakes
2025-08-05 20:02         ` James Bottomley
2025-08-05 20:48           ` Al Viro
2025-08-06 19:26           ` Lorenzo Stoakes
2025-08-07 12:25             ` Mark Brown
2025-08-07 13:00               ` Lorenzo Stoakes
2025-08-11 21:26                 ` Luis Chamberlain
2025-08-12 14:19                 ` Steven Rostedt
2025-08-06  4:04       ` Alexey Dobriyan
2025-08-06 20:36         ` Sasha Levin
2025-08-05 21:58   ` Jiri Kosina
2025-08-06  6:58     ` Hannes Reinecke
2025-08-06 19:36       ` Lorenzo Stoakes
2025-08-06 19:35     ` Lorenzo Stoakes
2025-08-05 18:10 ` H. Peter Anvin
2025-08-05 18:19   ` Lorenzo Stoakes
2025-08-06  5:49   ` Julia Lawall
2025-08-06  9:25     ` Dan Carpenter
2025-08-06  9:39       ` Julia Lawall
2025-08-06 19:30       ` Lorenzo Stoakes
2025-08-12 14:37         ` Steven Rostedt
2025-08-12 15:02           ` Sasha Levin
2025-08-12 15:24             ` Paul E. McKenney
2025-08-12 15:25               ` Sasha Levin
2025-08-12 15:28                 ` Paul E. McKenney
2025-12-08  1:12 ` Sasha Levin
2025-12-08  1:25   ` H. Peter Anvin
2025-12-08  1:59     ` Jonathan Corbet
2025-12-08  3:15       ` Steven Rostedt
2025-12-08  3:42         ` James Bottomley [this message]
2025-12-08  8:41           ` Mauro Carvalho Chehab
2025-12-08  9:16             ` James Bottomley
2025-12-08 10:22               ` Mauro Carvalho Chehab
2025-12-08  4:15   ` Laurent Pinchart
2025-12-08  4:31     ` Jonathan Corbet
2025-12-08  4:36       ` Laurent Pinchart
2025-12-08  7:00   ` Jiri Kosina
2025-12-08  7:38     ` James Bottomley

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=88091c9ac1d8f20bade177212445a60c752ba8b5.camel@HansenPartnership.com \
    --to=james.bottomley@hansenpartnership.com \
    --cc=corbet@lwn.net \
    --cc=hpa@zytor.com \
    --cc=ksummit@lists.linux.dev \
    --cc=rostedt@goodmis.org \
    --cc=sashal@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox