ksummit.lists.linux.dev archive mirror
 help / color / mirror / Atom feed
* [MAINTAINERS SUMMIT] Annotating patches containing AI-assisted code
@ 2025-08-05 15:38 Jiri Kosina
  2025-08-05 17:50 ` Sasha Levin
                   ` (2 more replies)
  0 siblings, 3 replies; 97+ messages in thread
From: Jiri Kosina @ 2025-08-05 15:38 UTC (permalink / raw)
  To: ksummit

This proposal is pretty much followup/spinoff of the discussion currently 
happening on LKML in one of the sub-threads of [1].

This is not really about legal aspects of AI-generated code and patches, I 
believe that'd be handled well handled well by LF, DCO, etc.

My concern here is more "human to human", as in "if I need to talk to a 
human that actually does understand the patch deeply enough, in context, 
etc .. who is that?"

I believe we need to at least settle on (and document) the way how to 
express in patch (meta)data:

- this patch has been assisted by LLM $X
- the human understanding the generated code is $Y

We might just implicitly assume this to be the first person in the S-O-B 
chain (which I personally don't think works for all scenarios, you can 
have multiple people working on it, etc), but even in such case I believe 
this needs to be clearly documented.

Plus, to further quote Laurent from that very thread:

===
I'm pretty sure every maintainer keeps a mental list of trust scores, and 
uses them when reviewing patches. Patch submitter who doesn't perform due 
diligence usually lose points, especially if the offences occur repeatedly 
(newcomers often get a few free passes thanks to their inexperience and 
the benefit of the doubt, at least with most maintainers).

LLMs increase the scale of the problem, and also makes it easier to fake 
due diligence. I believe it's important to make it very clear to 
contributors that they will suffer consequences if they don't hold up to 
the standards we expect.
===

[1] https://lore.kernel.org/all/20250727195802.2222764-1-sashal@kernel.org/

-- 
Jiri Kosina
SUSE Labs

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [MAINTAINERS SUMMIT] Annotating patches containing AI-assisted code
  2025-08-05 15:38 [MAINTAINERS SUMMIT] Annotating patches containing AI-assisted code Jiri Kosina
@ 2025-08-05 17:50 ` Sasha Levin
  2025-08-05 18:00   ` Laurent Pinchart
                     ` (2 more replies)
  2025-08-06  8:17 ` Dan Carpenter
  2025-09-15 18:01 ` Kees Cook
  2 siblings, 3 replies; 97+ messages in thread
From: Sasha Levin @ 2025-08-05 17:50 UTC (permalink / raw)
  To: Jiri Kosina; +Cc: ksummit

On Tue, Aug 05, 2025 at 05:38:36PM +0200, Jiri Kosina wrote:
>This proposal is pretty much followup/spinoff of the discussion currently
>happening on LKML in one of the sub-threads of [1].
>
>This is not really about legal aspects of AI-generated code and patches, I
>believe that'd be handled well handled well by LF, DCO, etc.
>
>My concern here is more "human to human", as in "if I need to talk to a
>human that actually does understand the patch deeply enough, in context,
>etc .. who is that?"
>
>I believe we need to at least settle on (and document) the way how to
>express in patch (meta)data:
>
>- this patch has been assisted by LLM $X
>- the human understanding the generated code is $Y
>
>We might just implicitly assume this to be the first person in the S-O-B
>chain (which I personally don't think works for all scenarios, you can
>have multiple people working on it, etc), but even in such case I believe
>this needs to be clearly documented.

The above isn't really an AI problem though.

We already have folks sending "checkpatch fixes" which only make code
less readable or "syzbot fixes" that shut up the warnings but are
completely bogus otherwise.

Sure, folks sending "AI fixes" could (will?) be a growing problem, but
tackling just the AI side of it is addressing one of the symptoms, not
the underlying issue.

-- 
Thanks,
Sasha

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [MAINTAINERS SUMMIT] Annotating patches containing AI-assisted code
  2025-08-05 17:50 ` Sasha Levin
@ 2025-08-05 18:00   ` Laurent Pinchart
  2025-08-05 18:16     ` Sasha Levin
                       ` (2 more replies)
  2025-08-05 18:32   ` Lorenzo Stoakes
  2025-08-08  8:31   ` Krzysztof Kozlowski
  2 siblings, 3 replies; 97+ messages in thread
From: Laurent Pinchart @ 2025-08-05 18:00 UTC (permalink / raw)
  To: Sasha Levin; +Cc: Jiri Kosina, ksummit

On Tue, Aug 05, 2025 at 01:50:57PM -0400, Sasha Levin wrote:
> On Tue, Aug 05, 2025 at 05:38:36PM +0200, Jiri Kosina wrote:
> >This proposal is pretty much followup/spinoff of the discussion currently
> >happening on LKML in one of the sub-threads of [1].
> >
> >This is not really about legal aspects of AI-generated code and patches, I
> >believe that'd be handled well handled well by LF, DCO, etc.
> >
> >My concern here is more "human to human", as in "if I need to talk to a
> >human that actually does understand the patch deeply enough, in context,
> >etc .. who is that?"
> >
> >I believe we need to at least settle on (and document) the way how to
> >express in patch (meta)data:
> >
> >- this patch has been assisted by LLM $X
> >- the human understanding the generated code is $Y
> >
> >We might just implicitly assume this to be the first person in the S-O-B
> >chain (which I personally don't think works for all scenarios, you can
> >have multiple people working on it, etc), but even in such case I believe
> >this needs to be clearly documented.
> 
> The above isn't really an AI problem though.
> 
> We already have folks sending "checkpatch fixes" which only make code
> less readable or "syzbot fixes" that shut up the warnings but are
> completely bogus otherwise.
> 
> Sure, folks sending "AI fixes" could (will?) be a growing problem, but
> tackling just the AI side of it is addressing one of the symptoms, not
> the underlying issue.

Perfect, let's document a policy and kill two birds with one stone then.

-- 
Regards,

Laurent Pinchart

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [MAINTAINERS SUMMIT] Annotating patches containing AI-assisted code
  2025-08-05 18:00   ` Laurent Pinchart
@ 2025-08-05 18:16     ` Sasha Levin
  2025-08-05 21:53       ` Jiri Kosina
  2025-08-05 22:41       ` Laurent Pinchart
  2025-08-05 18:34     ` Lorenzo Stoakes
  2025-08-05 22:06     ` Alexandre Belloni
  2 siblings, 2 replies; 97+ messages in thread
From: Sasha Levin @ 2025-08-05 18:16 UTC (permalink / raw)
  To: Laurent Pinchart; +Cc: Jiri Kosina, ksummit

On Tue, Aug 05, 2025 at 09:00:10PM +0300, Laurent Pinchart wrote:
>On Tue, Aug 05, 2025 at 01:50:57PM -0400, Sasha Levin wrote:
>> On Tue, Aug 05, 2025 at 05:38:36PM +0200, Jiri Kosina wrote:
>> >This proposal is pretty much followup/spinoff of the discussion currently
>> >happening on LKML in one of the sub-threads of [1].
>> >
>> >This is not really about legal aspects of AI-generated code and patches, I
>> >believe that'd be handled well handled well by LF, DCO, etc.
>> >
>> >My concern here is more "human to human", as in "if I need to talk to a
>> >human that actually does understand the patch deeply enough, in context,
>> >etc .. who is that?"
>> >
>> >I believe we need to at least settle on (and document) the way how to
>> >express in patch (meta)data:
>> >
>> >- this patch has been assisted by LLM $X
>> >- the human understanding the generated code is $Y
>> >
>> >We might just implicitly assume this to be the first person in the S-O-B
>> >chain (which I personally don't think works for all scenarios, you can
>> >have multiple people working on it, etc), but even in such case I believe
>> >this needs to be clearly documented.
>>
>> The above isn't really an AI problem though.
>>
>> We already have folks sending "checkpatch fixes" which only make code
>> less readable or "syzbot fixes" that shut up the warnings but are
>> completely bogus otherwise.
>>
>> Sure, folks sending "AI fixes" could (will?) be a growing problem, but
>> tackling just the AI side of it is addressing one of the symptoms, not
>> the underlying issue.
>
>Perfect, let's document a policy and kill two birds with one stone then.

So I've gone through some of our docs, and we already have the following
in submitting-patches.rst:

	Your patch will almost certainly get comments from reviewers on
	ways in which the patch can be improved, in the form of a reply
	to your email. You must respond to those comments; ignoring
	reviewers is a good way to get ignored in return. You can simply
	reply to their emails to answer their comments. Review comments
	or questions that do not lead to a code change should almost
	certainly bring about a comment or changelog entry so that the
	next reviewer better understands what is going on.

	Be sure to tell the reviewers what changes you are making and to
	thank them for their time.  Code review is a tiring and
	time-consuming process, and reviewers sometimes get grumpy.
	Even in that case, though, respond politely and address the
	problems they have pointed out.  When sending a next version,
	add a ``patch changelog`` to the cover letter or to individual
	patches explaining difference against previous submission (see
	:ref:`the_canonical_patch_format`).  Notify people that
	commented on your patch about new versions by adding them to the
	patches CC list.

In the context of this discussion it's a bit funny: we mandate that
reviews will be responded to, but we don't mandate that the response
will make any sense, which I think is Jiri's point.

The TIP maintainer's handbook (maintainer-tip.rst) actually seems to
tackle this:

    SOBs after the author SOB are from people handling and transporting
    the patch, but were not involved in development. SOB chains should
    reflect the **real** route a patch took as it was propagated to us,
    with the first SOB entry signalling primary authorship of a single
    author.

Should we clarify that this is true for any kernel patches?

-- 
Thanks,
Sasha

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [MAINTAINERS SUMMIT] Annotating patches containing AI-assisted code
  2025-08-05 17:50 ` Sasha Levin
  2025-08-05 18:00   ` Laurent Pinchart
@ 2025-08-05 18:32   ` Lorenzo Stoakes
  2025-08-08  8:31   ` Krzysztof Kozlowski
  2 siblings, 0 replies; 97+ messages in thread
From: Lorenzo Stoakes @ 2025-08-05 18:32 UTC (permalink / raw)
  To: Sasha Levin; +Cc: Jiri Kosina, ksummit

On Tue, Aug 05, 2025 at 01:50:57PM -0400, Sasha Levin wrote:
> On Tue, Aug 05, 2025 at 05:38:36PM +0200, Jiri Kosina wrote:
> > This proposal is pretty much followup/spinoff of the discussion currently
> > happening on LKML in one of the sub-threads of [1].
> >
> > This is not really about legal aspects of AI-generated code and patches, I
> > believe that'd be handled well handled well by LF, DCO, etc.
> >
> > My concern here is more "human to human", as in "if I need to talk to a
> > human that actually does understand the patch deeply enough, in context,
> > etc .. who is that?"
> >
> > I believe we need to at least settle on (and document) the way how to
> > express in patch (meta)data:
> >
> > - this patch has been assisted by LLM $X
> > - the human understanding the generated code is $Y
> >
> > We might just implicitly assume this to be the first person in the S-O-B
> > chain (which I personally don't think works for all scenarios, you can
> > have multiple people working on it, etc), but even in such case I believe
> > this needs to be clearly documented.
>
> The above isn't really an AI problem though.
>
> We already have folks sending "checkpatch fixes" which only make code
> less readable or "syzbot fixes" that shut up the warnings but are
> completely bogus otherwise.
>
> Sure, folks sending "AI fixes" could (will?) be a growing problem, but
> tackling just the AI side of it is addressing one of the symptoms, not
> the underlying issue.

I agree, I think Jiri's proposal is broader than AI - rather it is about
attribution and (in my view CoC-specific) consequences of incorrect
attribution.

However, I think a product of a broader discussion on AI is the production
of a general kernel AI policy which would cover off how attribution would
look _in that case_.

Perhaps a separate document that speaks to attribution as a whole would
also be appropriate?

For the CoC enforcement stuff, I think that is a seperate but possibly
related topic in itself.

Cheers, Lorenzo

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [MAINTAINERS SUMMIT] Annotating patches containing AI-assisted code
  2025-08-05 18:00   ` Laurent Pinchart
  2025-08-05 18:16     ` Sasha Levin
@ 2025-08-05 18:34     ` Lorenzo Stoakes
  2025-08-05 22:06     ` Alexandre Belloni
  2 siblings, 0 replies; 97+ messages in thread
From: Lorenzo Stoakes @ 2025-08-05 18:34 UTC (permalink / raw)
  To: Laurent Pinchart; +Cc: Sasha Levin, Jiri Kosina, ksummit

On Tue, Aug 05, 2025 at 09:00:10PM +0300, Laurent Pinchart wrote:
> Perfect, let's document a policy and kill two birds with one stone then.

I couldn't agree more that having explicit stated policy is a good thing
(TM) :)

I believe that vagueness on policy is a breeding ground for
misunderstanding and problems arising, and clarity is very important on
these things.

Cheers, Lorenzo

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [MAINTAINERS SUMMIT] Annotating patches containing AI-assisted code
  2025-08-05 18:16     ` Sasha Levin
@ 2025-08-05 21:53       ` Jiri Kosina
  2025-08-05 22:41       ` Laurent Pinchart
  1 sibling, 0 replies; 97+ messages in thread
From: Jiri Kosina @ 2025-08-05 21:53 UTC (permalink / raw)
  To: Sasha Levin; +Cc: Laurent Pinchart, ksummit

On Tue, 5 Aug 2025, Sasha Levin wrote:

> In the context of this discussion it's a bit funny: we mandate that
> reviews will be responded to, but we don't mandate that the response
> will make any sense, which I think is Jiri's point.

Yeah, indeed, pretty much.

> The TIP maintainer's handbook (maintainer-tip.rst) actually seems to
> tackle this:
> 
>    SOBs after the author SOB are from people handling and transporting
>    the patch, but were not involved in development. SOB chains should
>    reflect the **real** route a patch took as it was propagated to us,
>    with the first SOB entry signalling primary authorship of a single
>    author.
> 
> Should we clarify that this is true for any kernel patches?

It also seems to handle Co-developed-by: in a nice way a few lines above.

I think both of these shouldn't really be specific to tip.git 
documentation, and should be made general.

With this in place (and with the additional requirement of documenting 
that the code/patch has been LLM-assisted), I believe this specific part 
of the problem should be mostly covered.

Thanks,

-- 
Jiri Kosina
SUSE Labs


^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [MAINTAINERS SUMMIT] Annotating patches containing AI-assisted code
  2025-08-05 18:00   ` Laurent Pinchart
  2025-08-05 18:16     ` Sasha Levin
  2025-08-05 18:34     ` Lorenzo Stoakes
@ 2025-08-05 22:06     ` Alexandre Belloni
  2 siblings, 0 replies; 97+ messages in thread
From: Alexandre Belloni @ 2025-08-05 22:06 UTC (permalink / raw)
  To: Laurent Pinchart; +Cc: Sasha Levin, Jiri Kosina, ksummit

On 05/08/2025 21:00:10+0300, Laurent Pinchart wrote:
> On Tue, Aug 05, 2025 at 01:50:57PM -0400, Sasha Levin wrote:
> > On Tue, Aug 05, 2025 at 05:38:36PM +0200, Jiri Kosina wrote:
> > >This proposal is pretty much followup/spinoff of the discussion currently
> > >happening on LKML in one of the sub-threads of [1].
> > >
> > >This is not really about legal aspects of AI-generated code and patches, I
> > >believe that'd be handled well handled well by LF, DCO, etc.
> > >
> > >My concern here is more "human to human", as in "if I need to talk to a
> > >human that actually does understand the patch deeply enough, in context,
> > >etc .. who is that?"
> > >
> > >I believe we need to at least settle on (and document) the way how to
> > >express in patch (meta)data:
> > >
> > >- this patch has been assisted by LLM $X
> > >- the human understanding the generated code is $Y
> > >
> > >We might just implicitly assume this to be the first person in the S-O-B
> > >chain (which I personally don't think works for all scenarios, you can
> > >have multiple people working on it, etc), but even in such case I believe
> > >this needs to be clearly documented.
> > 
> > The above isn't really an AI problem though.
> > 
> > We already have folks sending "checkpatch fixes" which only make code
> > less readable or "syzbot fixes" that shut up the warnings but are
> > completely bogus otherwise.
> > 
> > Sure, folks sending "AI fixes" could (will?) be a growing problem, but
> > tackling just the AI side of it is addressing one of the symptoms, not
> > the underlying issue.
> 
> Perfect, let's document a policy and kill two birds with one stone then.
> 

Yes, I was going to bring up static checkers and the patches generated
to fix warnings that don't make sense. I'd like contributor to
explicitly state they used a tool to find an "issue" and generate the
patch so I could more easily ignore them. For example, we have been
adding plenty of return value checks and error messages for things that
are never going to happen or is they happen, the system is a state so
bad that it will never get to print the string.

-- 
Alexandre Belloni, co-owner and COO, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [MAINTAINERS SUMMIT] Annotating patches containing AI-assisted code
  2025-08-05 18:16     ` Sasha Levin
  2025-08-05 21:53       ` Jiri Kosina
@ 2025-08-05 22:41       ` Laurent Pinchart
  1 sibling, 0 replies; 97+ messages in thread
From: Laurent Pinchart @ 2025-08-05 22:41 UTC (permalink / raw)
  To: Sasha Levin; +Cc: Jiri Kosina, ksummit

On Tue, Aug 05, 2025 at 02:16:17PM -0400, Sasha Levin wrote:
> On Tue, Aug 05, 2025 at 09:00:10PM +0300, Laurent Pinchart wrote:
> > On Tue, Aug 05, 2025 at 01:50:57PM -0400, Sasha Levin wrote:
> >> On Tue, Aug 05, 2025 at 05:38:36PM +0200, Jiri Kosina wrote:
> >> > This proposal is pretty much followup/spinoff of the discussion currently
> >> > happening on LKML in one of the sub-threads of [1].
> >> >
> >> > This is not really about legal aspects of AI-generated code and patches, I
> >> > believe that'd be handled well handled well by LF, DCO, etc.
> >> >
> >> > My concern here is more "human to human", as in "if I need to talk to a
> >> > human that actually does understand the patch deeply enough, in context,
> >> > etc .. who is that?"
> >> >
> >> > I believe we need to at least settle on (and document) the way how to
> >> > express in patch (meta)data:
> >> >
> >> > - this patch has been assisted by LLM $X
> >> > - the human understanding the generated code is $Y
> >> >
> >> > We might just implicitly assume this to be the first person in the S-O-B
> >> > chain (which I personally don't think works for all scenarios, you can
> >> > have multiple people working on it, etc), but even in such case I believe
> >> > this needs to be clearly documented.
> >>
> >> The above isn't really an AI problem though.
> >>
> >> We already have folks sending "checkpatch fixes" which only make code
> >> less readable or "syzbot fixes" that shut up the warnings but are
> >> completely bogus otherwise.
> >>
> >> Sure, folks sending "AI fixes" could (will?) be a growing problem, but
> >> tackling just the AI side of it is addressing one of the symptoms, not
> >> the underlying issue.
> >
> > Perfect, let's document a policy and kill two birds with one stone then.
> 
> So I've gone through some of our docs, and we already have the following
> in submitting-patches.rst:
> 
> 	Your patch will almost certainly get comments from reviewers on
> 	ways in which the patch can be improved, in the form of a reply
> 	to your email. You must respond to those comments; ignoring
> 	reviewers is a good way to get ignored in return. You can simply
> 	reply to their emails to answer their comments. Review comments
> 	or questions that do not lead to a code change should almost
> 	certainly bring about a comment or changelog entry so that the
> 	next reviewer better understands what is going on.
> 
> 	Be sure to tell the reviewers what changes you are making and to
> 	thank them for their time.  Code review is a tiring and
> 	time-consuming process, and reviewers sometimes get grumpy.
> 	Even in that case, though, respond politely and address the
> 	problems they have pointed out.  When sending a next version,
> 	add a ``patch changelog`` to the cover letter or to individual
> 	patches explaining difference against previous submission (see
> 	:ref:`the_canonical_patch_format`).  Notify people that
> 	commented on your patch about new versions by adding them to the
> 	patches CC list.
> 
> In the context of this discussion it's a bit funny: we mandate that
> reviews will be responded to, but we don't mandate that the response
> will make any sense, which I think is Jiri's point.

I would consider that strongly implied. Are there contributors who could
in good faith consider that responses that don't make any sense are
perfectly fine ? If we had to state that explicitly, there would be
thousands of other assumptions we would need to document.

What I believe we need to document is the assumptions we make that may
not be self-evident to contributors. I assume that patches I receive are
understood by the author, as well as by the submitter unless stated
otherwise. LLMs may empower new (or existing) contributors to submit
more easily and in larger quantities patches that neither them nor
anyone else understand. If we all think nobody in their right mind would
do that, then there's nothing to document. I think the rule needs to be
stated clearly, as I'm concerned we'll see an increase in such
submissions.

> The TIP maintainer's handbook (maintainer-tip.rst) actually seems to
> tackle this:
> 
>     SOBs after the author SOB are from people handling and transporting
>     the patch, but were not involved in development. SOB chains should
>     reflect the **real** route a patch took as it was propagated to us,
>     with the first SOB entry signalling primary authorship of a single
>     author.
> 
> Should we clarify that this is true for any kernel patches?

This seems to be related to
https://lore.kernel.org/all/20250724072032.118554-1-hendrik.hamerlinck@hammernet.be/

-- 
Regards,

Laurent Pinchart

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [MAINTAINERS SUMMIT] Annotating patches containing AI-assisted code
  2025-08-05 15:38 [MAINTAINERS SUMMIT] Annotating patches containing AI-assisted code Jiri Kosina
  2025-08-05 17:50 ` Sasha Levin
@ 2025-08-06  8:17 ` Dan Carpenter
  2025-08-06 10:13   ` Mark Brown
  2025-09-15 18:01 ` Kees Cook
  2 siblings, 1 reply; 97+ messages in thread
From: Dan Carpenter @ 2025-08-06  8:17 UTC (permalink / raw)
  To: Jiri Kosina; +Cc: ksummit

Just a "Patch generated with AI" under the --- cut off line would be
fine.

We had a patch in staging from AI which "copy and pasted" from a spec
that it had hallucinated.  The language in the commit message is so
smooth and confident that it took a re-read to see that it's totally
nonsense.  A lot of the patches in staging are from newbies and
sometimes kids and I believe the person who sent the  AI assisted
patch did it with good intentions.  But, ugh, I don't want to deal
with that.

regards,
dan carpenter


^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [MAINTAINERS SUMMIT] Annotating patches containing AI-assisted code
  2025-08-06  8:17 ` Dan Carpenter
@ 2025-08-06 10:13   ` Mark Brown
  2025-08-12 14:36     ` Ben Dooks
  0 siblings, 1 reply; 97+ messages in thread
From: Mark Brown @ 2025-08-06 10:13 UTC (permalink / raw)
  To: Dan Carpenter; +Cc: Jiri Kosina, ksummit

[-- Attachment #1: Type: text/plain, Size: 857 bytes --]

On Wed, Aug 06, 2025 at 11:17:23AM +0300, Dan Carpenter wrote:

> Just a "Patch generated with AI" under the --- cut off line would be
> fine.

> We had a patch in staging from AI which "copy and pasted" from a spec
> that it had hallucinated.  The language in the commit message is so
> smooth and confident that it took a re-read to see that it's totally
> nonsense.  A lot of the patches in staging are from newbies and
> sometimes kids and I believe the person who sent the  AI assisted
> patch did it with good intentions.  But, ugh, I don't want to deal
> with that.

I think the suggestion from an earlier thread that people should say
what the AI they were using (as they tend to for static checkers and
so on) was good - that's useful for both noticing tools that work well
and tracking things down if we notice a pattern of errors with some
tool.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [MAINTAINERS SUMMIT] Annotating patches containing AI-assisted code
  2025-08-05 17:50 ` Sasha Levin
  2025-08-05 18:00   ` Laurent Pinchart
  2025-08-05 18:32   ` Lorenzo Stoakes
@ 2025-08-08  8:31   ` Krzysztof Kozlowski
  2025-08-11 21:46     ` Paul E. McKenney
  2 siblings, 1 reply; 97+ messages in thread
From: Krzysztof Kozlowski @ 2025-08-08  8:31 UTC (permalink / raw)
  To: Sasha Levin, Jiri Kosina; +Cc: ksummit

On 05/08/2025 19:50, Sasha Levin wrote:
> On Tue, Aug 05, 2025 at 05:38:36PM +0200, Jiri Kosina wrote:
>> This proposal is pretty much followup/spinoff of the discussion currently
>> happening on LKML in one of the sub-threads of [1].
>>
>> This is not really about legal aspects of AI-generated code and patches, I
>> believe that'd be handled well handled well by LF, DCO, etc.
>>
>> My concern here is more "human to human", as in "if I need to talk to a
>> human that actually does understand the patch deeply enough, in context,
>> etc .. who is that?"
>>
>> I believe we need to at least settle on (and document) the way how to
>> express in patch (meta)data:
>>
>> - this patch has been assisted by LLM $X
>> - the human understanding the generated code is $Y
>>
>> We might just implicitly assume this to be the first person in the S-O-B
>> chain (which I personally don't think works for all scenarios, you can
>> have multiple people working on it, etc), but even in such case I believe
>> this needs to be clearly documented.
> 
> The above isn't really an AI problem though.
> 
> We already have folks sending "checkpatch fixes" which only make code
> less readable or "syzbot fixes" that shut up the warnings but are
> completely bogus otherwise.
> 
> Sure, folks sending "AI fixes" could (will?) be a growing problem, but
> tackling just the AI side of it is addressing one of the symptoms, not
> the underlying issue.


I think there is a important difference in process and in result between
using existing tools, like coccinelle, sparse or even checkpatch, and
AI-assisted coding.

For the first you still need to write actual code and since you are
writing it, most likely you will compile it. Even if people fix the
warnings, not the problems, they still at least write the code and thus
this filters at least people who never wrote C.

With AI you do not have to even write it. It will hallucinate, create
some sort of C code and you just send it. No need to compile it even!

We do see poor contributions based on reports from existing tools, like
you mentioned, but the AI can significantly increase the flood of poor
contributions, that's why I call this tool different. And that
difference deserves annotation and treating differently than checkpatch
or coccinelle fixes.

I am all for requirement of marking AI-assisted patches, so I can set up
my filters correctly and ignore GPL-4.0, GPL-6.0 or other hallucinated
code (I already saw such in Devicetree bindings subsystem).

Best regards,
Krzysztof

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [MAINTAINERS SUMMIT] Annotating patches containing AI-assisted code
  2025-08-08  8:31   ` Krzysztof Kozlowski
@ 2025-08-11 21:46     ` Paul E. McKenney
  2025-08-11 21:57       ` Luck, Tony
                         ` (2 more replies)
  0 siblings, 3 replies; 97+ messages in thread
From: Paul E. McKenney @ 2025-08-11 21:46 UTC (permalink / raw)
  To: Krzysztof Kozlowski; +Cc: Sasha Levin, Jiri Kosina, ksummit

On Fri, Aug 08, 2025 at 10:31:27AM +0200, Krzysztof Kozlowski wrote:
> On 05/08/2025 19:50, Sasha Levin wrote:
> > On Tue, Aug 05, 2025 at 05:38:36PM +0200, Jiri Kosina wrote:
> >> This proposal is pretty much followup/spinoff of the discussion currently
> >> happening on LKML in one of the sub-threads of [1].
> >>
> >> This is not really about legal aspects of AI-generated code and patches, I
> >> believe that'd be handled well handled well by LF, DCO, etc.
> >>
> >> My concern here is more "human to human", as in "if I need to talk to a
> >> human that actually does understand the patch deeply enough, in context,
> >> etc .. who is that?"
> >>
> >> I believe we need to at least settle on (and document) the way how to
> >> express in patch (meta)data:
> >>
> >> - this patch has been assisted by LLM $X
> >> - the human understanding the generated code is $Y
> >>
> >> We might just implicitly assume this to be the first person in the S-O-B
> >> chain (which I personally don't think works for all scenarios, you can
> >> have multiple people working on it, etc), but even in such case I believe
> >> this needs to be clearly documented.
> > 
> > The above isn't really an AI problem though.
> > 
> > We already have folks sending "checkpatch fixes" which only make code
> > less readable or "syzbot fixes" that shut up the warnings but are
> > completely bogus otherwise.
> > 
> > Sure, folks sending "AI fixes" could (will?) be a growing problem, but
> > tackling just the AI side of it is addressing one of the symptoms, not
> > the underlying issue.
> 
> I think there is a important difference in process and in result between
> using existing tools, like coccinelle, sparse or even checkpatch, and
> AI-assisted coding.
> 
> For the first you still need to write actual code and since you are
> writing it, most likely you will compile it. Even if people fix the
> warnings, not the problems, they still at least write the code and thus
> this filters at least people who never wrote C.
> 
> With AI you do not have to even write it. It will hallucinate, create
> some sort of C code and you just send it. No need to compile it even!

Completely agreed, and furthermore, depending on how that AI was
trained, those using that AI's output might have some difficulty meeting
the requirements of the second portion of clause (a) of Developer's
Certificate of Origin (DCO) 1.1: "I have the right to submit it under
the open source license indicated in the file".

							Thanx, Paul

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [MAINTAINERS SUMMIT] Annotating patches containing AI-assisted code
  2025-08-11 21:46     ` Paul E. McKenney
@ 2025-08-11 21:57       ` Luck, Tony
  2025-08-11 22:12         ` Paul E. McKenney
  2025-08-11 22:28         ` Sasha Levin
  2025-08-11 22:11       ` Luis Chamberlain
  2025-08-12  8:38       ` James Bottomley
  2 siblings, 2 replies; 97+ messages in thread
From: Luck, Tony @ 2025-08-11 21:57 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: Krzysztof Kozlowski, Sasha Levin, Jiri Kosina, ksummit

On Mon, Aug 11, 2025 at 02:46:11PM -0700, Paul E. McKenney wrote:
> On Fri, Aug 08, 2025 at 10:31:27AM +0200, Krzysztof Kozlowski wrote:
> > On 05/08/2025 19:50, Sasha Levin wrote:
> > With AI you do not have to even write it. It will hallucinate, create
> > some sort of C code and you just send it. No need to compile it even!
> 
> Completely agreed, and furthermore, depending on how that AI was
> trained, those using that AI's output might have some difficulty meeting
> the requirements of the second portion of clause (a) of Developer's
> Certificate of Origin (DCO) 1.1: "I have the right to submit it under
> the open source license indicated in the file".

Should the rules be:

1) No submissions directly from an AI agent. The From: line must
always refer to a human.

2) The human on the From: line takes full responsibilty for the
contents of the patch. If it is garbage, or broken in some way
there's no fall back on the "but AI wrote that bit".

-Tony

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [MAINTAINERS SUMMIT] Annotating patches containing AI-assisted code
  2025-08-11 21:46     ` Paul E. McKenney
  2025-08-11 21:57       ` Luck, Tony
@ 2025-08-11 22:11       ` Luis Chamberlain
  2025-08-11 22:51         ` Paul E. McKenney
  2025-08-12  8:38       ` James Bottomley
  2 siblings, 1 reply; 97+ messages in thread
From: Luis Chamberlain @ 2025-08-11 22:11 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: Krzysztof Kozlowski, Sasha Levin, Jiri Kosina, ksummit

On Mon, Aug 11, 2025 at 02:46:11PM -0700, Paul E. McKenney wrote:
> depending on how that AI was
> trained, those using that AI's output might have some difficulty meeting
> the requirements of the second portion of clause (a) of Developer's
> Certificate of Origin (DCO) 1.1: "I have the right to submit it under
> the open source license indicated in the file".

If the argument is that cetain LLM generated code cannot be used for code under
the DCO, then:

a) isn't this debatable? Do we want to itemize a safe list for AI models
   which we think are safe to adopt for AI generated code?
b) seems kind of too late
c) If something like the Generated-by tag is used, and we trust it, then
   if we do want to side against merging AI generated code, that's perhaps our
   only chance at blocking that type of code. Its however not bullet proof.

I'm however not sure if a) hold water. Are folks seriously taking these
positions somewhere?

  Luis

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [MAINTAINERS SUMMIT] Annotating patches containing AI-assisted code
  2025-08-11 21:57       ` Luck, Tony
@ 2025-08-11 22:12         ` Paul E. McKenney
  2025-08-11 22:45           ` H. Peter Anvin
  2025-08-11 22:54           ` Jonathan Corbet
  2025-08-11 22:28         ` Sasha Levin
  1 sibling, 2 replies; 97+ messages in thread
From: Paul E. McKenney @ 2025-08-11 22:12 UTC (permalink / raw)
  To: Luck, Tony; +Cc: Krzysztof Kozlowski, Sasha Levin, Jiri Kosina, ksummit

On Mon, Aug 11, 2025 at 02:57:30PM -0700, Luck, Tony wrote:
> On Mon, Aug 11, 2025 at 02:46:11PM -0700, Paul E. McKenney wrote:
> > On Fri, Aug 08, 2025 at 10:31:27AM +0200, Krzysztof Kozlowski wrote:
> > > On 05/08/2025 19:50, Sasha Levin wrote:
> > > With AI you do not have to even write it. It will hallucinate, create
> > > some sort of C code and you just send it. No need to compile it even!
> > 
> > Completely agreed, and furthermore, depending on how that AI was
> > trained, those using that AI's output might have some difficulty meeting
> > the requirements of the second portion of clause (a) of Developer's
> > Certificate of Origin (DCO) 1.1: "I have the right to submit it under
> > the open source license indicated in the file".
> 
> Should the rules be:
> 
> 1) No submissions directly from an AI agent. The From: line must
> always refer to a human.
> 
> 2) The human on the From: line takes full responsibilty for the
> contents of the patch. If it is garbage, or broken in some way
> there's no fall back on the "but AI wrote that bit".

Another option is "The AI was trained only on input having a compatible
license."  Which, to your point, would to the best of my knowledge cut
out all of the popular and easily available AIs.

There might well be less restrictive conditions on the AI training data,
but I am not qualified to evaluate such conditions, let alone construct
them.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [MAINTAINERS SUMMIT] Annotating patches containing AI-assisted code
  2025-08-11 21:57       ` Luck, Tony
  2025-08-11 22:12         ` Paul E. McKenney
@ 2025-08-11 22:28         ` Sasha Levin
  2025-08-12 15:49           ` Steven Rostedt
  1 sibling, 1 reply; 97+ messages in thread
From: Sasha Levin @ 2025-08-11 22:28 UTC (permalink / raw)
  To: Luck, Tony; +Cc: Paul E. McKenney, Krzysztof Kozlowski, Jiri Kosina, ksummit

On Mon, Aug 11, 2025 at 02:57:30PM -0700, Luck, Tony wrote:
>On Mon, Aug 11, 2025 at 02:46:11PM -0700, Paul E. McKenney wrote:
>> On Fri, Aug 08, 2025 at 10:31:27AM +0200, Krzysztof Kozlowski wrote:
>> > On 05/08/2025 19:50, Sasha Levin wrote:
>> > With AI you do not have to even write it. It will hallucinate, create
>> > some sort of C code and you just send it. No need to compile it even!
>>
>> Completely agreed, and furthermore, depending on how that AI was
>> trained, those using that AI's output might have some difficulty meeting
>> the requirements of the second portion of clause (a) of Developer's
>> Certificate of Origin (DCO) 1.1: "I have the right to submit it under
>> the open source license indicated in the file".
>
>Should the rules be:
>
>1) No submissions directly from an AI agent. The From: line must
>always refer to a human.

We already require that, no?

We have the following in our docs:

         code from contributors without a known identity or anonymous
         contributors will not be accepted. All contributors are required
         to "sign off" on their code

Which requires a real, known, human identity behind the "Signed-off-by"
tag.

I don't think anyone here interperted it differently, but we can maybe
ask the LF legal folks if that's the case (and help us improve it) if
you think that this interpertation is not clear.

>2) The human on the From: line takes full responsibilty for the
>contents of the patch. If it is garbage, or broken in some way
>there's no fall back on the "but AI wrote that bit".

So right now the bigger issue the community has faced from this aspect
is mindless checkpatch/syzbot/etc "fixes".

We already have extensive requirements in the docs that often just get
ignored. Look at submit-checklist.rst, authors must:

1. Test with multiple debug configs.
2. Exercise all code paths with lockdep
3. Test with fault injection.
4. Verify build against linux-next.
5. Code builds cleanly with no warnings.
6. Passes all[yes,no,mod]config builds.
7. Builds on multiple CPU archs.
8. Follows coding style.
9. Keeps checkpatch.pl happy.
10. Checked with sparse.
11. Justifies any violations.

-- 
Thanks,
Sasha

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [MAINTAINERS SUMMIT] Annotating patches containing AI-assisted code
  2025-08-11 22:12         ` Paul E. McKenney
@ 2025-08-11 22:45           ` H. Peter Anvin
  2025-08-11 22:52             ` Paul E. McKenney
  2025-08-11 22:54           ` Jonathan Corbet
  1 sibling, 1 reply; 97+ messages in thread
From: H. Peter Anvin @ 2025-08-11 22:45 UTC (permalink / raw)
  To: paulmck, Paul E. McKenney, Luck, Tony
  Cc: Krzysztof Kozlowski, Sasha Levin, Jiri Kosina, ksummit

On August 11, 2025 3:12:25 PM PDT, "Paul E. McKenney" <paulmck@kernel.org> wrote:
>On Mon, Aug 11, 2025 at 02:57:30PM -0700, Luck, Tony wrote:
>> On Mon, Aug 11, 2025 at 02:46:11PM -0700, Paul E. McKenney wrote:
>> > On Fri, Aug 08, 2025 at 10:31:27AM +0200, Krzysztof Kozlowski wrote:
>> > > On 05/08/2025 19:50, Sasha Levin wrote:
>> > > With AI you do not have to even write it. It will hallucinate, create
>> > > some sort of C code and you just send it. No need to compile it even!
>> > 
>> > Completely agreed, and furthermore, depending on how that AI was
>> > trained, those using that AI's output might have some difficulty meeting
>> > the requirements of the second portion of clause (a) of Developer's
>> > Certificate of Origin (DCO) 1.1: "I have the right to submit it under
>> > the open source license indicated in the file".
>> 
>> Should the rules be:
>> 
>> 1) No submissions directly from an AI agent. The From: line must
>> always refer to a human.
>> 
>> 2) The human on the From: line takes full responsibilty for the
>> contents of the patch. If it is garbage, or broken in some way
>> there's no fall back on the "but AI wrote that bit".
>
>Another option is "The AI was trained only on input having a compatible
>license."  Which, to your point, would to the best of my knowledge cut
>out all of the popular and easily available AIs.
>
>There might well be less restrictive conditions on the AI training data,
>but I am not qualified to evaluate such conditions, let alone construct
>them.
>
>							Thanx, Paul
>

I think we need legal advice on this, but I think this is a *really* important issue. It could end up being a very ugly mess otherwise.

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [MAINTAINERS SUMMIT] Annotating patches containing AI-assisted code
  2025-08-11 22:11       ` Luis Chamberlain
@ 2025-08-11 22:51         ` Paul E. McKenney
  2025-08-11 23:22           ` Luis Chamberlain
                             ` (2 more replies)
  0 siblings, 3 replies; 97+ messages in thread
From: Paul E. McKenney @ 2025-08-11 22:51 UTC (permalink / raw)
  To: Luis Chamberlain; +Cc: Krzysztof Kozlowski, Sasha Levin, Jiri Kosina, ksummit

On Mon, Aug 11, 2025 at 03:11:47PM -0700, Luis Chamberlain wrote:
> On Mon, Aug 11, 2025 at 02:46:11PM -0700, Paul E. McKenney wrote:
> > depending on how that AI was
> > trained, those using that AI's output might have some difficulty meeting
> > the requirements of the second portion of clause (a) of Developer's
> > Certificate of Origin (DCO) 1.1: "I have the right to submit it under
> > the open source license indicated in the file".
> 
> If the argument is that cetain LLM generated code cannot be used for code under
> the DCO, then:
> 
> a) isn't this debatable? Do we want to itemize a safe list for AI models
>    which we think are safe to adopt for AI generated code?

For my own work, I will continue to avoid use of AI-generated artifacts
for open-source software projects unless and until some of the more
consequential "debates" are resolved favorably.

> b) seems kind of too late

Why?

> c) If something like the Generated-by tag is used, and we trust it, then
>    if we do want to side against merging AI generated code, that's perhaps our
>    only chance at blocking that type of code. Its however not bullet proof.

Nothing is bullet proof.  ;-)

At the same time, I have no idea whether or not a Generated-by tag is
a good idea.

> I'm however not sure if a) hold water. Are folks seriously taking these
> positions somewhere?

I am seriously taking that position for my own work and will continue
to do so until further notice.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [MAINTAINERS SUMMIT] Annotating patches containing AI-assisted code
  2025-08-11 22:45           ` H. Peter Anvin
@ 2025-08-11 22:52             ` Paul E. McKenney
  0 siblings, 0 replies; 97+ messages in thread
From: Paul E. McKenney @ 2025-08-11 22:52 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Luck, Tony, Krzysztof Kozlowski, Sasha Levin, Jiri Kosina, ksummit

On Mon, Aug 11, 2025 at 03:45:54PM -0700, H. Peter Anvin wrote:
> On August 11, 2025 3:12:25 PM PDT, "Paul E. McKenney" <paulmck@kernel.org> wrote:
> >On Mon, Aug 11, 2025 at 02:57:30PM -0700, Luck, Tony wrote:
> >> On Mon, Aug 11, 2025 at 02:46:11PM -0700, Paul E. McKenney wrote:
> >> > On Fri, Aug 08, 2025 at 10:31:27AM +0200, Krzysztof Kozlowski wrote:
> >> > > On 05/08/2025 19:50, Sasha Levin wrote:
> >> > > With AI you do not have to even write it. It will hallucinate, create
> >> > > some sort of C code and you just send it. No need to compile it even!
> >> > 
> >> > Completely agreed, and furthermore, depending on how that AI was
> >> > trained, those using that AI's output might have some difficulty meeting
> >> > the requirements of the second portion of clause (a) of Developer's
> >> > Certificate of Origin (DCO) 1.1: "I have the right to submit it under
> >> > the open source license indicated in the file".
> >> 
> >> Should the rules be:
> >> 
> >> 1) No submissions directly from an AI agent. The From: line must
> >> always refer to a human.
> >> 
> >> 2) The human on the From: line takes full responsibilty for the
> >> contents of the patch. If it is garbage, or broken in some way
> >> there's no fall back on the "but AI wrote that bit".
> >
> >Another option is "The AI was trained only on input having a compatible
> >license."  Which, to your point, would to the best of my knowledge cut
> >out all of the popular and easily available AIs.
> >
> >There might well be less restrictive conditions on the AI training data,
> >but I am not qualified to evaluate such conditions, let alone construct
> >them.
> 
> I think we need legal advice on this, but I think this is a *really*
> important issue. It could end up being a very ugly mess otherwise.

Indeed, one of the reasonsss that I am not qualified is that I am
no lawyer.  ;-)

							Thanx, Paul

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [MAINTAINERS SUMMIT] Annotating patches containing AI-assisted code
  2025-08-11 22:12         ` Paul E. McKenney
  2025-08-11 22:45           ` H. Peter Anvin
@ 2025-08-11 22:54           ` Jonathan Corbet
  2025-08-11 23:03             ` Paul E. McKenney
  1 sibling, 1 reply; 97+ messages in thread
From: Jonathan Corbet @ 2025-08-11 22:54 UTC (permalink / raw)
  To: paulmck, Luck, Tony
  Cc: Krzysztof Kozlowski, Sasha Levin, Jiri Kosina, ksummit

"Paul E. McKenney" <paulmck@kernel.org> writes:

> Another option is "The AI was trained only on input having a compatible
> license."  Which, to your point, would to the best of my knowledge cut
> out all of the popular and easily available AIs.

That option, of course, opens a separate barrel of worms: if we are
relying on the system having been trained only on compatibly licensed
material, then our ability to distribute the result depends on our
complying with the relevant licenses, right?  Including little details
like preserving copyright notices...?

Somehow, I don't really think that this option brings us much joy.

jon

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [MAINTAINERS SUMMIT] Annotating patches containing AI-assisted code
  2025-08-11 22:54           ` Jonathan Corbet
@ 2025-08-11 23:03             ` Paul E. McKenney
  2025-08-12 15:47               ` Steven Rostedt
  0 siblings, 1 reply; 97+ messages in thread
From: Paul E. McKenney @ 2025-08-11 23:03 UTC (permalink / raw)
  To: Jonathan Corbet
  Cc: Luck, Tony, Krzysztof Kozlowski, Sasha Levin, Jiri Kosina, ksummit

On Mon, Aug 11, 2025 at 04:54:38PM -0600, Jonathan Corbet wrote:
> "Paul E. McKenney" <paulmck@kernel.org> writes:
> 
> > Another option is "The AI was trained only on input having a compatible
> > license."  Which, to your point, would to the best of my knowledge cut
> > out all of the popular and easily available AIs.
> 
> That option, of course, opens a separate barrel of worms: if we are
> relying on the system having been trained only on compatibly licensed
> material, then our ability to distribute the result depends on our
> complying with the relevant licenses, right?  Including little details
> like preserving copyright notices...?
> 
> Somehow, I don't really think that this option brings us much joy.

All fair points!

At the same time, I freely confess that I am not yet seeing an option
that brings us much joy, at least for values of "joy" that include actual
incorporation of AI/ML source-code output into the Linux kernel.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [MAINTAINERS SUMMIT] Annotating patches containing AI-assisted code
  2025-08-11 22:51         ` Paul E. McKenney
@ 2025-08-11 23:22           ` Luis Chamberlain
  2025-08-11 23:42             ` Paul E. McKenney
  2025-08-18 21:41             ` Mauro Carvalho Chehab
  2025-08-12 16:01           ` Steven Rostedt
  2025-08-18 21:23           ` Mauro Carvalho Chehab
  2 siblings, 2 replies; 97+ messages in thread
From: Luis Chamberlain @ 2025-08-11 23:22 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: Krzysztof Kozlowski, Sasha Levin, Jiri Kosina, ksummit

On Mon, Aug 11, 2025 at 03:51:48PM -0700, Paul E. McKenney wrote:
> On Mon, Aug 11, 2025 at 03:11:47PM -0700, Luis Chamberlain wrote:
> > b) seems kind of too late
> 
> Why?

One cannot assume at this point AI generated code has not been merged
into any large scale open source project.

I am also not sure it can be stopped.

> > c) If something like the Generated-by tag is used, and we trust it, then
> >    if we do want to side against merging AI generated code, that's perhaps our
> >    only chance at blocking that type of code. Its however not bullet proof.
> 
> Nothing is bullet proof.  ;-)

Agreed, and I think the legal concerns over AI code use are just as weak. I
just don't see it holidng up long term.

My expectations are that eventually foundation AI models will simply state they
use permissively licensed code for training, and be done with these concerns.

Until then -- we just have wild speculations and I can't see any
sensible case ending up in court wanting to avoid AI code in open source.

> At the same time, I have no idea whether or not a Generated-by tag is
> a good idea.

I do that to make it crystal clear for a project maintainer when I use it.

  Luis

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [MAINTAINERS SUMMIT] Annotating patches containing AI-assisted code
  2025-08-11 23:22           ` Luis Chamberlain
@ 2025-08-11 23:42             ` Paul E. McKenney
  2025-08-12  0:02               ` Luis Chamberlain
  2025-08-18 21:41             ` Mauro Carvalho Chehab
  1 sibling, 1 reply; 97+ messages in thread
From: Paul E. McKenney @ 2025-08-11 23:42 UTC (permalink / raw)
  To: Luis Chamberlain; +Cc: Krzysztof Kozlowski, Sasha Levin, Jiri Kosina, ksummit

On Mon, Aug 11, 2025 at 04:22:21PM -0700, Luis Chamberlain wrote:
> On Mon, Aug 11, 2025 at 03:51:48PM -0700, Paul E. McKenney wrote:
> > On Mon, Aug 11, 2025 at 03:11:47PM -0700, Luis Chamberlain wrote:
> > > b) seems kind of too late
> > 
> > Why?
> 
> One cannot assume at this point AI generated code has not been merged
> into any large scale open source project.

I agree that it is quite possible that AI-generated code has already been
merged into large-scale open-source projects, including the Linux kernel.
I do not see why this possibility requires us to merge AI-generated code
in the future.

> I am also not sure it can be stopped.

As noted below, nothing is bullet proof.

> > > c) If something like the Generated-by tag is used, and we trust it, then
> > >    if we do want to side against merging AI generated code, that's perhaps our
> > >    only chance at blocking that type of code. Its however not bullet proof.
> > 
> > Nothing is bullet proof.  ;-)
> 
> Agreed, and I think the legal concerns over AI code use are just as weak. I
> just don't see it holidng up long term.

That is quite possible.  But on what are you basing that legal opinion?

Also, even if you have a valid legal opinion that stands up long-term,
situations that prove to be just fine in the long term can be extremely
uncomfortable in the meantime.

> My expectations are that eventually foundation AI models will simply state they
> use permissively licensed code for training, and be done with these concerns.
> 
> Until then -- we just have wild speculations and I can't see any
> sensible case ending up in court wanting to avoid AI code in open source.

I don't know about open source, but they tell me that related cases are
already in court.  Yes, there was a recent decision that was favorable
to your position, which is great, but not necessarily either definitive
or final.

> > At the same time, I have no idea whether or not a Generated-by tag is
> > a good idea.
> 
> I do that to make it crystal clear for a project maintainer when I use it.

I understand and sympathize with your intent, but I do not have an
informed opinion on the risks in either direction.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [MAINTAINERS SUMMIT] Annotating patches containing AI-assisted code
  2025-08-11 23:42             ` Paul E. McKenney
@ 2025-08-12  0:02               ` Luis Chamberlain
  2025-08-12  2:49                 ` Paul E. McKenney
  0 siblings, 1 reply; 97+ messages in thread
From: Luis Chamberlain @ 2025-08-12  0:02 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: Krzysztof Kozlowski, Sasha Levin, Jiri Kosina, ksummit

On Mon, Aug 11, 2025 at 04:42:18PM -0700, Paul E. McKenney wrote:
> On Mon, Aug 11, 2025 at 04:22:21PM -0700, Luis Chamberlain wrote:
> > On Mon, Aug 11, 2025 at 03:51:48PM -0700, Paul E. McKenney wrote:
> > > On Mon, Aug 11, 2025 at 03:11:47PM -0700, Luis Chamberlain wrote:
> > > > c) If something like the Generated-by tag is used, and we trust it, then
> > > >    if we do want to side against merging AI generated code, that's perhaps our
> > > >    only chance at blocking that type of code. Its however not bullet proof.
> > > 
> > > Nothing is bullet proof.  ;-)
> > 
> > Agreed, and I think the legal concerns over AI code use are just as weak. I
> > just don't see it holidng up long term.
> 
> That is quite possible.  But on what are you basing that legal opinion?

Its not a legal opinion. Its a personal opinion based on projections on
growth, adoption, and personal risk analysis, and valuation for my own projects.
At some point a project needs to take a positon on this, I had decide
sooner for another project.

> > My expectations are that eventually foundation AI models will simply state they
> > use permissively licensed code for training, and be done with these concerns.
> > 
> > Until then -- we just have wild speculations and I can't see any
> > sensible case ending up in court wanting to avoid AI code in open source.
> 
> I don't know about open source, but they tell me that related cases are
> already in court.  Yes, there was a recent decision that was favorable
> to your position, which is great, but not necessarily either definitive
> or final.

Indeed, its a risk assessment in the end.

Let us take an example. If one is using foundation models perhaps the
ugliest position you could be in, is if you want to avoid GPL code on a
non-GPL codebase. Since we don't have access to AI model training
logistics, if we just work out of the code on Github the numbers I
came up with was about 60% permissively licensed code, 25% GPL, 15%
unclear. Give or take.

If you're using copyleft code though, well, the project is already open.
So what's the risk assessment? Well who and why would they go after your
project? My risk assessment for my projet is low, and due the high
empirical value I already see in leveraging AI code, I think its worth to
embrace.

Eventually I predict foundation models will just take a position to
annotate where what code they train their models on and I suspect that
will be permissively licensed code. By the time this happens most of the
code we know written by humans will have been replaced already.

  Luis

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [MAINTAINERS SUMMIT] Annotating patches containing AI-assisted code
  2025-08-12  0:02               ` Luis Chamberlain
@ 2025-08-12  2:49                 ` Paul E. McKenney
  0 siblings, 0 replies; 97+ messages in thread
From: Paul E. McKenney @ 2025-08-12  2:49 UTC (permalink / raw)
  To: Luis Chamberlain; +Cc: Krzysztof Kozlowski, Sasha Levin, Jiri Kosina, ksummit

On Mon, Aug 11, 2025 at 05:02:19PM -0700, Luis Chamberlain wrote:
> On Mon, Aug 11, 2025 at 04:42:18PM -0700, Paul E. McKenney wrote:
> > On Mon, Aug 11, 2025 at 04:22:21PM -0700, Luis Chamberlain wrote:
> > > On Mon, Aug 11, 2025 at 03:51:48PM -0700, Paul E. McKenney wrote:
> > > > On Mon, Aug 11, 2025 at 03:11:47PM -0700, Luis Chamberlain wrote:
> > > > > c) If something like the Generated-by tag is used, and we trust it, then
> > > > >    if we do want to side against merging AI generated code, that's perhaps our
> > > > >    only chance at blocking that type of code. Its however not bullet proof.
> > > > 
> > > > Nothing is bullet proof.  ;-)
> > > 
> > > Agreed, and I think the legal concerns over AI code use are just as weak. I
> > > just don't see it holidng up long term.
> > 
> > That is quite possible.  But on what are you basing that legal opinion?
> 
> Its not a legal opinion. Its a personal opinion based on projections on
> growth, adoption, and personal risk analysis, and valuation for my own projects.
> At some point a project needs to take a positon on this, I had decide
> sooner for another project.

Your project, your opinion, so questions asked.  From me, anyway.

But...

> > > My expectations are that eventually foundation AI models will simply state they
> > > use permissively licensed code for training, and be done with these concerns.
> > > 
> > > Until then -- we just have wild speculations and I can't see any
> > > sensible case ending up in court wanting to avoid AI code in open source.
> > 
> > I don't know about open source, but they tell me that related cases are
> > already in court.  Yes, there was a recent decision that was favorable
> > to your position, which is great, but not necessarily either definitive
> > or final.
> 
> Indeed, its a risk assessment in the end.
> 
> Let us take an example. If one is using foundation models perhaps the
> ugliest position you could be in, is if you want to avoid GPL code on a
> non-GPL codebase. Since we don't have access to AI model training
> logistics, if we just work out of the code on Github the numbers I
> came up with was about 60% permissively licensed code, 25% GPL, 15%
> unclear. Give or take.
> 
> If you're using copyleft code though, well, the project is already open.
> So what's the risk assessment? Well who and why would they go after your
> project? My risk assessment for my projet is low, and due the high
> empirical value I already see in leveraging AI code, I think its worth to
> embrace.

Sadly, there is precedent for people going after copyleft projects.

> Eventually I predict foundation models will just take a position to
> annotate where what code they train their models on and I suspect that
> will be permissively licensed code. By the time this happens most of the
> code we know written by humans will have been replaced already.

Perhaps.  But on the other hand, there is a lot of code still in use
that was written by humans who have long since passed on.  So I am not
convinced that code replacement will happen all that quickly.

Or maybe you are saying that it will be a good long time before AI
projects implement the kind of traceability that we are discussing?

							Thanx, Paul

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [MAINTAINERS SUMMIT] Annotating patches containing AI-assisted code
  2025-08-11 21:46     ` Paul E. McKenney
  2025-08-11 21:57       ` Luck, Tony
  2025-08-11 22:11       ` Luis Chamberlain
@ 2025-08-12  8:38       ` James Bottomley
  2025-08-12 13:15         ` Bird, Tim
                           ` (3 more replies)
  2 siblings, 4 replies; 97+ messages in thread
From: James Bottomley @ 2025-08-12  8:38 UTC (permalink / raw)
  To: paulmck, Krzysztof Kozlowski; +Cc: Sasha Levin, Jiri Kosina, ksummit

On Mon, 2025-08-11 at 14:46 -0700, Paul E. McKenney wrote:
> On Fri, Aug 08, 2025 at 10:31:27AM +0200, Krzysztof Kozlowski wrote:
> > On 05/08/2025 19:50, Sasha Levin wrote:
> > > On Tue, Aug 05, 2025 at 05:38:36PM +0200, Jiri Kosina wrote:
> > > > This proposal is pretty much followup/spinoff of the discussion
> > > > currently happening on LKML in one of the sub-threads of [1].
> > > > 
> > > > This is not really about legal aspects of AI-generated code and
> > > > patches, I believe that'd be handled well handled well by LF,
> > > > DCO, etc.
> > > > 
> > > > My concern here is more "human to human", as in "if I need to
> > > > talk to a human that actually does understand the patch deeply
> > > > enough, in context, etc .. who is that?"
> > > > 
> > > > I believe we need to at least settle on (and document) the way
> > > > how to express in patch (meta)data:
> > > > 
> > > > - this patch has been assisted by LLM $X
> > > > - the human understanding the generated code is $Y
> > > > 
> > > > We might just implicitly assume this to be the first person in
> > > > the S-O-B chain (which I personally don't think works for all
> > > > scenarios, you can have multiple people working on it, etc),
> > > > but even in such case I believe this needs to be clearly
> > > > documented.
> > > 
> > > The above isn't really an AI problem though.
> > > 
> > > We already have folks sending "checkpatch fixes" which only make
> > > code less readable or "syzbot fixes" that shut up the warnings
> > > but are completely bogus otherwise.
> > > 
> > > Sure, folks sending "AI fixes" could (will?) be a growing
> > > problem, but tackling just the AI side of it is addressing one of
> > > the symptoms, not the underlying issue.
> > 
> > I think there is a important difference in process and in result
> > between using existing tools, like coccinelle, sparse or even
> > checkpatch, and AI-assisted coding.
> > 
> > For the first you still need to write actual code and since you are
> > writing it, most likely you will compile it. Even if people fix the
> > warnings, not the problems, they still at least write the code and
> > thus this filters at least people who never wrote C.
> > 
> > With AI you do not have to even write it. It will hallucinate,
> > create some sort of C code and you just send it. No need to compile
> > it even!
> 
> Completely agreed, and furthermore, depending on how that AI was
> trained, those using that AI's output might have some difficulty
> meeting the requirements of the second portion of clause (a) of
> Developer's Certificate of Origin (DCO) 1.1: "I have the right to
> submit it under the open source license indicated in the file".

Just on the legality of this.  Under US Law, provided the output isn't
a derivative work (and all the suits over training data have so far
failed to prove that it is), copyright in an AI created piece of code,
actually doesn't exist because a non human entity can't legally hold
copyright of a work.  The US copyright office has actually issued this
opinion (huge 3 volume report):

https://www.copyright.gov/ai/

But amazingly enough congress has a more succinct summary:

https://www.congress.gov/crs-product/LSB10922

But the bottom line is that pure AI generated code is effectively
uncopyrightable and therefore public domain which means anyone
definitely has the right to submit it to the kernel under the DCO.

I imagine this situation might be changed by legislation in the future
when people want to monetize AI output, but such a change can't be
retroactive, so for now we're OK legally to accept pure AI code with
the signoff of the submitter (and whatever AI annotation tags we come
up with).

Of course if you take AI output and modify it before submitting, then
the modifications do have copyright (provided a human made them).

Regards,

James


^ permalink raw reply	[flat|nested] 97+ messages in thread

* RE: [MAINTAINERS SUMMIT] Annotating patches containing AI-assisted code
  2025-08-12  8:38       ` James Bottomley
@ 2025-08-12 13:15         ` Bird, Tim
  2025-08-12 14:31           ` Greg KH
  2025-08-18 21:12           ` Mauro Carvalho Chehab
  2025-08-12 14:42         ` Paul E. McKenney
                           ` (2 subsequent siblings)
  3 siblings, 2 replies; 97+ messages in thread
From: Bird, Tim @ 2025-08-12 13:15 UTC (permalink / raw)
  To: James Bottomley, paulmck, Krzysztof Kozlowski
  Cc: Sasha Levin, Jiri Kosina, ksummit



> -----Original Message-----
> From: James Bottomley <James.Bottomley@HansenPartnership.com>
> On Mon, 2025-08-11 at 14:46 -0700, Paul E. McKenney wrote:
> > On Fri, Aug 08, 2025 at 10:31:27AM +0200, Krzysztof Kozlowski wrote:
> > > On 05/08/2025 19:50, Sasha Levin wrote:
> > > > On Tue, Aug 05, 2025 at 05:38:36PM +0200, Jiri Kosina wrote:
> > > > > This proposal is pretty much followup/spinoff of the discussion
> > > > > currently happening on LKML in one of the sub-threads of [1].
> > > > >
> > > > > This is not really about legal aspects of AI-generated code and
> > > > > patches, I believe that'd be handled well handled well by LF,
> > > > > DCO, etc.
> > > > >
> > > > > My concern here is more "human to human", as in "if I need to
> > > > > talk to a human that actually does understand the patch deeply
> > > > > enough, in context, etc .. who is that?"
> > > > >
> > > > > I believe we need to at least settle on (and document) the way
> > > > > how to express in patch (meta)data:
> > > > >
> > > > > - this patch has been assisted by LLM $X
> > > > > - the human understanding the generated code is $Y
> > > > >
> > > > > We might just implicitly assume this to be the first person in
> > > > > the S-O-B chain (which I personally don't think works for all
> > > > > scenarios, you can have multiple people working on it, etc),
> > > > > but even in such case I believe this needs to be clearly
> > > > > documented.
> > > >
> > > > The above isn't really an AI problem though.
> > > >
> > > > We already have folks sending "checkpatch fixes" which only make
> > > > code less readable or "syzbot fixes" that shut up the warnings
> > > > but are completely bogus otherwise.
> > > >
> > > > Sure, folks sending "AI fixes" could (will?) be a growing
> > > > problem, but tackling just the AI side of it is addressing one of
> > > > the symptoms, not the underlying issue.
> > >
> > > I think there is a important difference in process and in result
> > > between using existing tools, like coccinelle, sparse or even
> > > checkpatch, and AI-assisted coding.
> > >
> > > For the first you still need to write actual code and since you are
> > > writing it, most likely you will compile it. Even if people fix the
> > > warnings, not the problems, they still at least write the code and
> > > thus this filters at least people who never wrote C.
> > >
> > > With AI you do not have to even write it. It will hallucinate,
> > > create some sort of C code and you just send it. No need to compile
> > > it even!
> >
> > Completely agreed, and furthermore, depending on how that AI was
> > trained, those using that AI's output might have some difficulty
> > meeting the requirements of the second portion of clause (a) of
> > Developer's Certificate of Origin (DCO) 1.1: "I have the right to
> > submit it under the open source license indicated in the file".
> 
> Just on the legality of this.  Under US Law, provided the output isn't
> a derivative work (and all the suits over training data have so far
> failed to prove that it is),

This is indeed so.  I have followed the GitHub copilot litigation
(see https://githubcopilotlitigation.com/case-updates.html), and a few
other cases related to whether AI output violates the copyright of the training
data (that is, is a form of derivative work).  I'm not a lawyer, but the legal
reasoning for judgements passed down so far have been, IMHO, atrocious.
Some claims have been thrown out because the output was not identical
to the training data (even when things like comments from the code in
the training data were copied verbatim into the output).  Companies doing
AI code generation now scrub their outputs to make sure nothing
in the output is identical to material in the training data.  However, I'm not
sure this is enough, and this requirement for identicality (to prove derivative work)
is problematic, when copyright law only requires proof of substantial similarity.

The copilot case is going through appeal now, and I wouldn't bet on which
way the outcome will drop.  It could very well yet result that AI output is deemed
to be derivative work of the training data in some cases.  If that occurs, then even restricting
training data to GPL code wouldn't be a sufficient workaround to enable using the AI output
in the kernel.  And, as has been stated elsewhere, there are no currently no major models restricting
their code training data to permissively licensed code.  This makes it infeasible to use
any of the popular models with a high degree of certainty that the output is legally OK.

No legal pun intended, but I think the jury is still out on this issue, and I think it
would be wise to be EXTREMELY cautious introducing AI-generated code into the kernel.
I personally would not submit something for inclusion into the kernel proper that
was AI-generated.  Generation of tools or tests is, IMO, a different matter and I'm
less concerned about that.

Getting back to the discussion at hand, I believe that annotating that a contribution was
AI-generated (or that AI was involved) will at least give us some assistance to re-review
the code and possibly remove or replace it should the legal status of AI-generated code
become problematic in the future.

There is also value in flagging that additional scrutiny may be warranted
at the time of submission.  So I like the idea in principal.

 -- Tim

> copyright in an AI created piece of code,
> actually doesn't exist because a non human entity can't legally hold
> copyright of a work.  The US copyright office has actually issued this
> opinion (huge 3 volume report):
> 
> https://urldefense.com/v3/__https://www.copyright.gov/ai/__;!!O7_YSHcmd9jp3hj_4dEAcyQ!2VMaxMOBIYDHma42N7zDgm5AoJR9Mu4lT0
> _3G6qm0AjSWcqMDjQa7ydTFdLDYUvDE5d9eJtkwIRAO_Kok3fq0KFnCte1js36oeQ$
> 
> But amazingly enough congress has a more succinct summary:
> 
> https://urldefense.com/v3/__https://www.congress.gov/crs-
> product/LSB10922__;!!O7_YSHcmd9jp3hj_4dEAcyQ!2VMaxMOBIYDHma42N7zDgm5AoJR9Mu4lT0_3G6qm0AjSWcqMDjQa7ydTFdLDYUvDE5
> d9eJtkwIRAO_Kok3fq0KFnCte18GKQTDs$
> 
> But the bottom line is that pure AI generated code is effectively
> uncopyrightable and therefore public domain which means anyone
> definitely has the right to submit it to the kernel under the DCO.
> 
> I imagine this situation might be changed by legislation in the future
> when people want to monetize AI output, but such a change can't be
> retroactive, so for now we're OK legally to accept pure AI code with
> the signoff of the submitter (and whatever AI annotation tags we come
> up with).
> 
> Of course if you take AI output and modify it before submitting, then
> the modifications do have copyright (provided a human made them).
> 
> Regards,
> 
> James
> 


^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [MAINTAINERS SUMMIT] Annotating patches containing AI-assisted code
  2025-08-12 13:15         ` Bird, Tim
@ 2025-08-12 14:31           ` Greg KH
  2025-08-18 21:12           ` Mauro Carvalho Chehab
  1 sibling, 0 replies; 97+ messages in thread
From: Greg KH @ 2025-08-12 14:31 UTC (permalink / raw)
  To: Bird, Tim
  Cc: James Bottomley, paulmck, Krzysztof Kozlowski, Sasha Levin,
	Jiri Kosina, ksummit

On Tue, Aug 12, 2025 at 01:15:33PM +0000, Bird, Tim wrote:
> The copilot case is going through appeal now, and I wouldn't bet on which
> way the outcome will drop.  It could very well yet result that AI output is deemed
> to be derivative work of the training data in some cases.  If that occurs, then even restricting
> training data to GPL code wouldn't be a sufficient workaround to enable using the AI output
> in the kernel.  And, as has been stated elsewhere, there are no currently no major models restricting
> their code training data to permissively licensed code.  This makes it infeasible to use
> any of the popular models with a high degree of certainty that the output is legally OK.

As semi-proof of this, everyone will note that the closed operating
system developer teams have NOT put their code into these public coding
models, NOR do they allow AI tools trained on these public models to be
checked into their closed source codebases.

So I strongly suggest that perhaps until that happens, we too should
seriously consider that the legal issues involved be taken semi-seriously :)

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [MAINTAINERS SUMMIT] Annotating patches containing AI-assisted code
  2025-08-06 10:13   ` Mark Brown
@ 2025-08-12 14:36     ` Ben Dooks
  0 siblings, 0 replies; 97+ messages in thread
From: Ben Dooks @ 2025-08-12 14:36 UTC (permalink / raw)
  To: Mark Brown, Dan Carpenter; +Cc: Jiri Kosina, ksummit

On 06/08/2025 11:13, Mark Brown wrote:
> On Wed, Aug 06, 2025 at 11:17:23AM +0300, Dan Carpenter wrote:
> 
>> Just a "Patch generated with AI" under the --- cut off line would be
>> fine.
> 
>> We had a patch in staging from AI which "copy and pasted" from a spec
>> that it had hallucinated.  The language in the commit message is so
>> smooth and confident that it took a re-read to see that it's totally
>> nonsense.  A lot of the patches in staging are from newbies and
>> sometimes kids and I believe the person who sent the  AI assisted
>> patch did it with good intentions.  But, ugh, I don't want to deal
>> with that.
> 
> I think the suggestion from an earlier thread that people should say
> what the AI they were using (as they tend to for static checkers and
> so on) was good - that's useful for both noticing tools that work well
> and tracking things down if we notice a pattern of errors with some
> tool.

Also, if AI is used, then how was it used? Keeping the inputs may also
be useful ?

-- 
Ben Dooks				http://www.codethink.co.uk/
Senior Engineer				Codethink - Providing Genius

https://www.codethink.co.uk/privacy.html

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [MAINTAINERS SUMMIT] Annotating patches containing AI-assisted code
  2025-08-12  8:38       ` James Bottomley
  2025-08-12 13:15         ` Bird, Tim
@ 2025-08-12 14:42         ` Paul E. McKenney
  2025-08-12 15:55           ` Laurent Pinchart
  2025-08-18 21:07           ` Mauro Carvalho Chehab
  2025-08-18 17:53         ` Rafael J. Wysocki
  2025-08-18 19:13         ` Mauro Carvalho Chehab
  3 siblings, 2 replies; 97+ messages in thread
From: Paul E. McKenney @ 2025-08-12 14:42 UTC (permalink / raw)
  To: James Bottomley; +Cc: Krzysztof Kozlowski, Sasha Levin, Jiri Kosina, ksummit

On Tue, Aug 12, 2025 at 09:38:12AM +0100, James Bottomley wrote:
> On Mon, 2025-08-11 at 14:46 -0700, Paul E. McKenney wrote:
> > On Fri, Aug 08, 2025 at 10:31:27AM +0200, Krzysztof Kozlowski wrote:
> > > On 05/08/2025 19:50, Sasha Levin wrote:
> > > > On Tue, Aug 05, 2025 at 05:38:36PM +0200, Jiri Kosina wrote:
> > > > > This proposal is pretty much followup/spinoff of the discussion
> > > > > currently happening on LKML in one of the sub-threads of [1].
> > > > > 
> > > > > This is not really about legal aspects of AI-generated code and
> > > > > patches, I believe that'd be handled well handled well by LF,
> > > > > DCO, etc.
> > > > > 
> > > > > My concern here is more "human to human", as in "if I need to
> > > > > talk to a human that actually does understand the patch deeply
> > > > > enough, in context, etc .. who is that?"
> > > > > 
> > > > > I believe we need to at least settle on (and document) the way
> > > > > how to express in patch (meta)data:
> > > > > 
> > > > > - this patch has been assisted by LLM $X
> > > > > - the human understanding the generated code is $Y
> > > > > 
> > > > > We might just implicitly assume this to be the first person in
> > > > > the S-O-B chain (which I personally don't think works for all
> > > > > scenarios, you can have multiple people working on it, etc),
> > > > > but even in such case I believe this needs to be clearly
> > > > > documented.
> > > > 
> > > > The above isn't really an AI problem though.
> > > > 
> > > > We already have folks sending "checkpatch fixes" which only make
> > > > code less readable or "syzbot fixes" that shut up the warnings
> > > > but are completely bogus otherwise.
> > > > 
> > > > Sure, folks sending "AI fixes" could (will?) be a growing
> > > > problem, but tackling just the AI side of it is addressing one of
> > > > the symptoms, not the underlying issue.
> > > 
> > > I think there is a important difference in process and in result
> > > between using existing tools, like coccinelle, sparse or even
> > > checkpatch, and AI-assisted coding.
> > > 
> > > For the first you still need to write actual code and since you are
> > > writing it, most likely you will compile it. Even if people fix the
> > > warnings, not the problems, they still at least write the code and
> > > thus this filters at least people who never wrote C.
> > > 
> > > With AI you do not have to even write it. It will hallucinate,
> > > create some sort of C code and you just send it. No need to compile
> > > it even!
> > 
> > Completely agreed, and furthermore, depending on how that AI was
> > trained, those using that AI's output might have some difficulty
> > meeting the requirements of the second portion of clause (a) of
> > Developer's Certificate of Origin (DCO) 1.1: "I have the right to
> > submit it under the open source license indicated in the file".
> 
> Just on the legality of this.  Under US Law, provided the output isn't
> a derivative work (and all the suits over training data have so far
> failed to prove that it is), copyright in an AI created piece of code,
> actually doesn't exist because a non human entity can't legally hold
> copyright of a work.  The US copyright office has actually issued this
> opinion (huge 3 volume report):
> 
> https://www.copyright.gov/ai/
> 
> But amazingly enough congress has a more succinct summary:
> 
> https://www.congress.gov/crs-product/LSB10922

Indeed:

	While the Constitution and Copyright Act do not explicitly define
	who (or what) may be an "author," U.S. courts to date have not
	recognized copyright in works that lack a human author—including
	works created autonomously by AI systems.

Please note the "U.S. courts *to* *date*".  :-(

> But the bottom line is that pure AI generated code is effectively
> uncopyrightable and therefore public domain which means anyone
> definitely has the right to submit it to the kernel under the DCO.
> 
> I imagine this situation might be changed by legislation in the future
> when people want to monetize AI output, but such a change can't be
> retroactive, so for now we're OK legally to accept pure AI code with
> the signoff of the submitter (and whatever AI annotation tags we come
> up with).

Except that the USA is a case-law jurisdiction, and changes
in interpretation of existing laws can be and have been applied
retroactively, give or take things like statutes of limitations.  And we
need to worry about more than just USA law.

And I do agree that many of the lawsuits seem to be motivated by an
overwhelmening desire to monetize the output of AI that was induced by
someone else's prompts, if that is what you are getting at.  It does seem
to me personally that after you have sliced and diced the training data,
fair use should apply, but last I checked, fair use was a USA-only thing.

> Of course if you take AI output and modify it before submitting, then
> the modifications do have copyright (provided a human made them).

Agreed, in that case, it is well established that the AI output would
have at least one layer of copyright protection.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [MAINTAINERS SUMMIT] Annotating patches containing AI-assisted code
  2025-08-11 23:03             ` Paul E. McKenney
@ 2025-08-12 15:47               ` Steven Rostedt
  2025-08-12 16:06                 ` Paul E. McKenney
  0 siblings, 1 reply; 97+ messages in thread
From: Steven Rostedt @ 2025-08-12 15:47 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Jonathan Corbet, Luck, Tony, Krzysztof Kozlowski, Sasha Levin,
	Jiri Kosina, ksummit

On Mon, 11 Aug 2025 16:03:34 -0700
"Paul E. McKenney" <paulmck@kernel.org> wrote:

> At the same time, I freely confess that I am not yet seeing an option
> that brings us much joy, at least for values of "joy" that include actual
> incorporation of AI/ML source-code output into the Linux kernel.

I guess it will only bring AJ (Artificial Joy) :-p

-- Steve

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [MAINTAINERS SUMMIT] Annotating patches containing AI-assisted code
  2025-08-11 22:28         ` Sasha Levin
@ 2025-08-12 15:49           ` Steven Rostedt
  2025-08-12 16:03             ` Krzysztof Kozlowski
  0 siblings, 1 reply; 97+ messages in thread
From: Steven Rostedt @ 2025-08-12 15:49 UTC (permalink / raw)
  To: Sasha Levin
  Cc: Luck, Tony, Paul E. McKenney, Krzysztof Kozlowski, Jiri Kosina, ksummit

On Mon, 11 Aug 2025 18:28:52 -0400
Sasha Levin <sashal@kernel.org> wrote:

> We have the following in our docs:
> 
>          code from contributors without a known identity or anonymous
>          contributors will not be accepted. All contributors are required
>          to "sign off" on their code
> 
> Which requires a real, known, human identity behind the "Signed-off-by"
> tag.

I guess the real question is, if you have AI write the code, do you have
the right to add your Signed-off-by to it? Especially if you don't know
what that AI was trained on.

Does the Signed-off-by mean if later on, we find that the AI used an
patented algorithm, the one that added their SoB can be in legal trouble?

-- Steve

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [MAINTAINERS SUMMIT] Annotating patches containing AI-assisted code
  2025-08-12 14:42         ` Paul E. McKenney
@ 2025-08-12 15:55           ` Laurent Pinchart
  2025-08-18 21:07           ` Mauro Carvalho Chehab
  1 sibling, 0 replies; 97+ messages in thread
From: Laurent Pinchart @ 2025-08-12 15:55 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: James Bottomley, Krzysztof Kozlowski, Sasha Levin, Jiri Kosina, ksummit

On Tue, Aug 12, 2025 at 07:42:21AM -0700, Paul E. McKenney wrote:
> On Tue, Aug 12, 2025 at 09:38:12AM +0100, James Bottomley wrote:
> > On Mon, 2025-08-11 at 14:46 -0700, Paul E. McKenney wrote:
> > > On Fri, Aug 08, 2025 at 10:31:27AM +0200, Krzysztof Kozlowski wrote:
> > > > On 05/08/2025 19:50, Sasha Levin wrote:
> > > > > On Tue, Aug 05, 2025 at 05:38:36PM +0200, Jiri Kosina wrote:
> > > > > > This proposal is pretty much followup/spinoff of the discussion
> > > > > > currently happening on LKML in one of the sub-threads of [1].
> > > > > > 
> > > > > > This is not really about legal aspects of AI-generated code and
> > > > > > patches, I believe that'd be handled well handled well by LF,
> > > > > > DCO, etc.
> > > > > > 
> > > > > > My concern here is more "human to human", as in "if I need to
> > > > > > talk to a human that actually does understand the patch deeply
> > > > > > enough, in context, etc .. who is that?"
> > > > > > 
> > > > > > I believe we need to at least settle on (and document) the way
> > > > > > how to express in patch (meta)data:
> > > > > > 
> > > > > > - this patch has been assisted by LLM $X
> > > > > > - the human understanding the generated code is $Y
> > > > > > 
> > > > > > We might just implicitly assume this to be the first person in
> > > > > > the S-O-B chain (which I personally don't think works for all
> > > > > > scenarios, you can have multiple people working on it, etc),
> > > > > > but even in such case I believe this needs to be clearly
> > > > > > documented.
> > > > > 
> > > > > The above isn't really an AI problem though.
> > > > > 
> > > > > We already have folks sending "checkpatch fixes" which only make
> > > > > code less readable or "syzbot fixes" that shut up the warnings
> > > > > but are completely bogus otherwise.
> > > > > 
> > > > > Sure, folks sending "AI fixes" could (will?) be a growing
> > > > > problem, but tackling just the AI side of it is addressing one of
> > > > > the symptoms, not the underlying issue.
> > > > 
> > > > I think there is a important difference in process and in result
> > > > between using existing tools, like coccinelle, sparse or even
> > > > checkpatch, and AI-assisted coding.
> > > > 
> > > > For the first you still need to write actual code and since you are
> > > > writing it, most likely you will compile it. Even if people fix the
> > > > warnings, not the problems, they still at least write the code and
> > > > thus this filters at least people who never wrote C.
> > > > 
> > > > With AI you do not have to even write it. It will hallucinate,
> > > > create some sort of C code and you just send it. No need to compile
> > > > it even!
> > > 
> > > Completely agreed, and furthermore, depending on how that AI was
> > > trained, those using that AI's output might have some difficulty
> > > meeting the requirements of the second portion of clause (a) of
> > > Developer's Certificate of Origin (DCO) 1.1: "I have the right to
> > > submit it under the open source license indicated in the file".
> > 
> > Just on the legality of this.  Under US Law, provided the output isn't
> > a derivative work (and all the suits over training data have so far
> > failed to prove that it is), copyright in an AI created piece of code,
> > actually doesn't exist because a non human entity can't legally hold
> > copyright of a work.  The US copyright office has actually issued this
> > opinion (huge 3 volume report):
> > 
> > https://www.copyright.gov/ai/
> > 
> > But amazingly enough congress has a more succinct summary:
> > 
> > https://www.congress.gov/crs-product/LSB10922
> 
> Indeed:
> 
> 	While the Constitution and Copyright Act do not explicitly define
> 	who (or what) may be an "author," U.S. courts to date have not
> 	recognized copyright in works that lack a human author—including
> 	works created autonomously by AI systems.
> 
> Please note the "U.S. courts *to* *date*".  :-(
> 
> > But the bottom line is that pure AI generated code is effectively
> > uncopyrightable and therefore public domain which means anyone
> > definitely has the right to submit it to the kernel under the DCO.
> > 
> > I imagine this situation might be changed by legislation in the future
> > when people want to monetize AI output, but such a change can't be
> > retroactive, so for now we're OK legally to accept pure AI code with
> > the signoff of the submitter (and whatever AI annotation tags we come
> > up with).
> 
> Except that the USA is a case-law jurisdiction, and changes
> in interpretation of existing laws can be and have been applied
> retroactively, give or take things like statutes of limitations.  And we
> need to worry about more than just USA law.
> 
> And I do agree that many of the lawsuits seem to be motivated by an
> overwhelmening desire to monetize the output of AI that was induced by
> someone else's prompts, if that is what you are getting at.  It does seem
> to me personally that after you have sliced and diced the training data,
> fair use should apply, but last I checked, fair use was a USA-only thing.

I've read many legal arguments or concerns in this mail thread. While
fair use has legal definitions, we seem to have avoided discussing the
ethical aspect so far.

The vast majority of free software licenses were written at a point
where most people were not predicting how open-source code would be used
to train LLMs by large commercial actors. This may not be much of an
issue for the more permissive licenses: if I release code under CC0-1.0,
it can probably be fairlyo assumed I won't oppose to any specific usage
of it, LLM training or otherwise. For copyleft code, on the other hand,
it is way less clear if code authors who picked a specific copyleft
license would approve of proprietary LLMs being trained on their code.
While those authors may or may not have a say from a legal point of view
(far from universally clear at this point), it doesn't meant the Linux
kernel should approve (explicitly or tacitly) a practice that may be
legal but could be considered unethical by a part of the community.

Regardless of the decision we make, I think it's important to take the
ethical argument into consideration. If the kernel decides to approve
usage of LLMs for code generation purpose, those who make that decision
should be accountable for its ethical aspect as well.

> > Of course if you take AI output and modify it before submitting, then
> > the modifications do have copyright (provided a human made them).
> 
> Agreed, in that case, it is well established that the AI output would
> have at least one layer of copyright protection.

-- 
Regards,

Laurent Pinchart

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [MAINTAINERS SUMMIT] Annotating patches containing AI-assisted code
  2025-08-11 22:51         ` Paul E. McKenney
  2025-08-11 23:22           ` Luis Chamberlain
@ 2025-08-12 16:01           ` Steven Rostedt
  2025-08-12 16:22             ` Paul E. McKenney
  2025-08-18 21:23           ` Mauro Carvalho Chehab
  2 siblings, 1 reply; 97+ messages in thread
From: Steven Rostedt @ 2025-08-12 16:01 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Luis Chamberlain, Krzysztof Kozlowski, Sasha Levin, Jiri Kosina, ksummit

On Mon, 11 Aug 2025 15:51:48 -0700
"Paul E. McKenney" <paulmck@kernel.org> wrote:

> > a) isn't this debatable? Do we want to itemize a safe list for AI models
> >    which we think are safe to adopt for AI generated code?  
> 
> For my own work, I will continue to avoid use of AI-generated artifacts
> for open-source software projects unless and until some of the more
> consequential "debates" are resolved favorably.

Does that include people who submit AI generated code to you?

This would also require AI use disclosures.

-- Steve

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [MAINTAINERS SUMMIT] Annotating patches containing AI-assisted code
  2025-08-12 15:49           ` Steven Rostedt
@ 2025-08-12 16:03             ` Krzysztof Kozlowski
  2025-08-12 16:12               ` Paul E. McKenney
  0 siblings, 1 reply; 97+ messages in thread
From: Krzysztof Kozlowski @ 2025-08-12 16:03 UTC (permalink / raw)
  To: Steven Rostedt, Sasha Levin
  Cc: Luck, Tony, Paul E. McKenney, Jiri Kosina, ksummit

On 12/08/2025 17:49, Steven Rostedt wrote:
> On Mon, 11 Aug 2025 18:28:52 -0400
> Sasha Levin <sashal@kernel.org> wrote:
> 
>> We have the following in our docs:
>>
>>          code from contributors without a known identity or anonymous
>>          contributors will not be accepted. All contributors are required
>>          to "sign off" on their code
>>
>> Which requires a real, known, human identity behind the "Signed-off-by"
>> tag.
> 
> I guess the real question is, if you have AI write the code, do you have
> the right to add your Signed-off-by to it? Especially if you don't know
> what that AI was trained on.
> 
> Does the Signed-off-by mean if later on, we find that the AI used an
> patented algorithm, the one that added their SoB can be in legal trouble?


Maybe we should be very explicit about annotating AI-generated patches
and instead of (see workflows discussion [1]):

	Assisted-by: ....

expect different tag, like:

	Legal-risk-by:

or:
	Legally-questionable-because-of:

[1] https://lore.kernel.org/r/20250809234008.1540324-1-sashal@kernel.org/

Best regards,
Krzysztof

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [MAINTAINERS SUMMIT] Annotating patches containing AI-assisted code
  2025-08-12 15:47               ` Steven Rostedt
@ 2025-08-12 16:06                 ` Paul E. McKenney
  0 siblings, 0 replies; 97+ messages in thread
From: Paul E. McKenney @ 2025-08-12 16:06 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Jonathan Corbet, Luck, Tony, Krzysztof Kozlowski, Sasha Levin,
	Jiri Kosina, ksummit

On Tue, Aug 12, 2025 at 11:47:01AM -0400, Steven Rostedt wrote:
> On Mon, 11 Aug 2025 16:03:34 -0700
> "Paul E. McKenney" <paulmck@kernel.org> wrote:
> 
> > At the same time, I freely confess that I am not yet seeing an option
> > that brings us much joy, at least for values of "joy" that include actual
> > incorporation of AI/ML source-code output into the Linux kernel.
> 
> I guess it will only bring AJ (Artificial Joy) :-p

;-) ;-) ;-)

							Thanx, Paul

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [MAINTAINERS SUMMIT] Annotating patches containing AI-assisted code
  2025-08-12 16:03             ` Krzysztof Kozlowski
@ 2025-08-12 16:12               ` Paul E. McKenney
  2025-08-12 16:17                 ` Krzysztof Kozlowski
  0 siblings, 1 reply; 97+ messages in thread
From: Paul E. McKenney @ 2025-08-12 16:12 UTC (permalink / raw)
  To: Krzysztof Kozlowski
  Cc: Steven Rostedt, Sasha Levin, Luck, Tony, Jiri Kosina, ksummit

On Tue, Aug 12, 2025 at 06:03:36PM +0200, Krzysztof Kozlowski wrote:
> On 12/08/2025 17:49, Steven Rostedt wrote:
> > On Mon, 11 Aug 2025 18:28:52 -0400
> > Sasha Levin <sashal@kernel.org> wrote:
> > 
> >> We have the following in our docs:
> >>
> >>          code from contributors without a known identity or anonymous
> >>          contributors will not be accepted. All contributors are required
> >>          to "sign off" on their code
> >>
> >> Which requires a real, known, human identity behind the "Signed-off-by"
> >> tag.
> > 
> > I guess the real question is, if you have AI write the code, do you have
> > the right to add your Signed-off-by to it? Especially if you don't know
> > what that AI was trained on.
> > 
> > Does the Signed-off-by mean if later on, we find that the AI used an
> > patented algorithm, the one that added their SoB can be in legal trouble?
> 
> 
> Maybe we should be very explicit about annotating AI-generated patches
> and instead of (see workflows discussion [1]):
> 
> 	Assisted-by: ....
> 
> expect different tag, like:
> 
> 	Legal-risk-by:
> 
> or:
> 	Legally-questionable-because-of:
> 
> [1] https://lore.kernel.org/r/20250809234008.1540324-1-sashal@kernel.org/

If you have to add one of those last two tags, my carefully considered
advice is to refrain from applying the patch.

Applying a patch containing the first tag might not be wise, either,
depending on the outcome of a number of lawsuits currently in flight.
Plus there are a lot of other countries that might choose to weigh in.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [MAINTAINERS SUMMIT] Annotating patches containing AI-assisted code
  2025-08-12 16:12               ` Paul E. McKenney
@ 2025-08-12 16:17                 ` Krzysztof Kozlowski
  2025-08-12 17:12                   ` Steven Rostedt
  0 siblings, 1 reply; 97+ messages in thread
From: Krzysztof Kozlowski @ 2025-08-12 16:17 UTC (permalink / raw)
  To: paulmck; +Cc: Steven Rostedt, Sasha Levin, Luck, Tony, Jiri Kosina, ksummit

On 12/08/2025 18:12, Paul E. McKenney wrote:
> On Tue, Aug 12, 2025 at 06:03:36PM +0200, Krzysztof Kozlowski wrote:
>> On 12/08/2025 17:49, Steven Rostedt wrote:
>>> On Mon, 11 Aug 2025 18:28:52 -0400
>>> Sasha Levin <sashal@kernel.org> wrote:
>>>
>>>> We have the following in our docs:
>>>>
>>>>          code from contributors without a known identity or anonymous
>>>>          contributors will not be accepted. All contributors are required
>>>>          to "sign off" on their code
>>>>
>>>> Which requires a real, known, human identity behind the "Signed-off-by"
>>>> tag.
>>>
>>> I guess the real question is, if you have AI write the code, do you have
>>> the right to add your Signed-off-by to it? Especially if you don't know
>>> what that AI was trained on.
>>>
>>> Does the Signed-off-by mean if later on, we find that the AI used an
>>> patented algorithm, the one that added their SoB can be in legal trouble?
>>
>>
>> Maybe we should be very explicit about annotating AI-generated patches
>> and instead of (see workflows discussion [1]):
>>
>> 	Assisted-by: ....
>>
>> expect different tag, like:
>>
>> 	Legal-risk-by:
>>
>> or:
>> 	Legally-questionable-because-of:
>>
>> [1] https://lore.kernel.org/r/20250809234008.1540324-1-sashal@kernel.org/
> 
> If you have to add one of those last two tags, my carefully considered
> advice is to refrain from applying the patch.
> 
> Applying a patch containing the first tag might not be wise, either,
> depending on the outcome of a number of lawsuits currently in flight.
> Plus there are a lot of other countries that might choose to weigh in.

Yes, that's what I wanted to imply. At least person applying the patch
and then later sending in pull request to next maintainer could not use
excuse "I did not know, that Assisted-by causes legal risk".

Best regards,
Krzysztof

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [MAINTAINERS SUMMIT] Annotating patches containing AI-assisted code
  2025-08-12 16:01           ` Steven Rostedt
@ 2025-08-12 16:22             ` Paul E. McKenney
  0 siblings, 0 replies; 97+ messages in thread
From: Paul E. McKenney @ 2025-08-12 16:22 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Luis Chamberlain, Krzysztof Kozlowski, Sasha Levin, Jiri Kosina, ksummit

On Tue, Aug 12, 2025 at 12:01:31PM -0400, Steven Rostedt wrote:
> On Mon, 11 Aug 2025 15:51:48 -0700
> "Paul E. McKenney" <paulmck@kernel.org> wrote:
> 
> > > a) isn't this debatable? Do we want to itemize a safe list for AI models
> > >    which we think are safe to adopt for AI generated code?  
> > 
> > For my own work, I will continue to avoid use of AI-generated artifacts
> > for open-source software projects unless and until some of the more
> > consequential "debates" are resolved favorably.
> 
> Does that include people who submit AI generated code to you?
> 
> This would also require AI use disclosures.

We have all avoided applying patches containing copyright violations
for a very long time.  And the possibilility of such violations is one
thing that seems to me to be adddressed by the DCO, which says (among
other things):

	I have the right to submit it under the open source license
	indicated in the file

Whether we like it or not, there are lawsuits in flight that could
potentially come to decisions that result in incorporation of AI-generated
code into the Linux kernel being copyright violations, which would
mean that the submitter does not have the right to submit.

And no, I have no way to identify AI-generated code.  If I mistakenly
incorporate some AI-generated code, I must rip it out and clean-room
construct some alternative.  Just as is already the case for other
potential legal issues.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [MAINTAINERS SUMMIT] Annotating patches containing AI-assisted code
  2025-08-12 16:17                 ` Krzysztof Kozlowski
@ 2025-08-12 17:12                   ` Steven Rostedt
  2025-08-12 17:39                     ` Paul E. McKenney
  0 siblings, 1 reply; 97+ messages in thread
From: Steven Rostedt @ 2025-08-12 17:12 UTC (permalink / raw)
  To: Krzysztof Kozlowski
  Cc: paulmck, Sasha Levin, Luck, Tony, Jiri Kosina, ksummit

On Tue, 12 Aug 2025 18:17:46 +0200
Krzysztof Kozlowski <krzk@kernel.org> wrote:

> > Applying a patch containing the first tag might not be wise, either,
> > depending on the outcome of a number of lawsuits currently in flight.
> > Plus there are a lot of other countries that might choose to weigh in.  
> 
> Yes, that's what I wanted to imply. At least person applying the patch
> and then later sending in pull request to next maintainer could not use
> excuse "I did not know, that Assisted-by causes legal risk".

Once you add your SoB, it means "You know". As described in the Submitting
Patches documentation:

  Developer's Certificate of Origin 1.1 
  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 

  By making a contribution to this project, I certify that:

        (a) The contribution was created in whole or in part by me and I
            have the right to submit it under the open source license
            indicated in the file; or

        (b) The contribution is based upon previous work that, to the best
            of my knowledge, is covered under an appropriate open source
            license and I have the right under that license to submit that
            work with modifications, whether created in whole or in part
            by me, under the same open source license (unless I am
            permitted to submit under a different license), as indicated
            in the file; or 

        (c) The contribution was provided directly to me by some other
            person who certified (a), (b) or (c) and I have not modified
            it.

        (d) I understand and agree that this project and the contribution
            are public and that a record of the contribution (including all
            personal information I submit with it, including my sign-off) is
            maintained indefinitely and may be redistributed consistent with
            this project or the open source license(s) involved.

  then you just add a line saying::
 
        Signed-off-by: Random J Developer <random@developer.example.org>


If you add your SOB and then find out later that your AI tool added some
code that was not allowed, then you broke (a) and (b), and I believe you
are legally liable because your SoB means "You know". If you don't know,
then you should *not* be submitting the code.

-- Steve

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [MAINTAINERS SUMMIT] Annotating patches containing AI-assisted code
  2025-08-12 17:12                   ` Steven Rostedt
@ 2025-08-12 17:39                     ` Paul E. McKenney
  0 siblings, 0 replies; 97+ messages in thread
From: Paul E. McKenney @ 2025-08-12 17:39 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Krzysztof Kozlowski, Sasha Levin, Luck, Tony, Jiri Kosina, ksummit

On Tue, Aug 12, 2025 at 01:12:48PM -0400, Steven Rostedt wrote:
> On Tue, 12 Aug 2025 18:17:46 +0200
> Krzysztof Kozlowski <krzk@kernel.org> wrote:
> 
> > > Applying a patch containing the first tag might not be wise, either,
> > > depending on the outcome of a number of lawsuits currently in flight.
> > > Plus there are a lot of other countries that might choose to weigh in.  
> > 
> > Yes, that's what I wanted to imply. At least person applying the patch
> > and then later sending in pull request to next maintainer could not use
> > excuse "I did not know, that Assisted-by causes legal risk".
> 
> Once you add your SoB, it means "You know". As described in the Submitting
> Patches documentation:
> 
>   Developer's Certificate of Origin 1.1 
>   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 
> 
>   By making a contribution to this project, I certify that:
> 
>         (a) The contribution was created in whole or in part by me and I
>             have the right to submit it under the open source license
>             indicated in the file; or
> 
>         (b) The contribution is based upon previous work that, to the best
>             of my knowledge, is covered under an appropriate open source
>             license and I have the right under that license to submit that
>             work with modifications, whether created in whole or in part
>             by me, under the same open source license (unless I am
>             permitted to submit under a different license), as indicated
>             in the file; or 
> 
>         (c) The contribution was provided directly to me by some other
>             person who certified (a), (b) or (c) and I have not modified
>             it.
> 
>         (d) I understand and agree that this project and the contribution
>             are public and that a record of the contribution (including all
>             personal information I submit with it, including my sign-off) is
>             maintained indefinitely and may be redistributed consistent with
>             this project or the open source license(s) involved.
> 
>   then you just add a line saying::
>  
>         Signed-off-by: Random J Developer <random@developer.example.org>
> 
> 
> If you add your SOB and then find out later that your AI tool added some
> code that was not allowed, then you broke (a) and (b), and I believe you
> are legally liable because your SoB means "You know". If you don't know,
> then you should *not* be submitting the code.

Couldn't have said it better myself, thank you!

							Thanx, Paul

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [MAINTAINERS SUMMIT] Annotating patches containing AI-assisted code
  2025-08-12  8:38       ` James Bottomley
  2025-08-12 13:15         ` Bird, Tim
  2025-08-12 14:42         ` Paul E. McKenney
@ 2025-08-18 17:53         ` Rafael J. Wysocki
  2025-08-18 18:32           ` James Bottomley
  2025-08-18 19:13         ` Mauro Carvalho Chehab
  3 siblings, 1 reply; 97+ messages in thread
From: Rafael J. Wysocki @ 2025-08-18 17:53 UTC (permalink / raw)
  To: James Bottomley
  Cc: paulmck, Krzysztof Kozlowski, Sasha Levin, Jiri Kosina, ksummit

On Tue, Aug 12, 2025 at 10:41 AM James Bottomley
<James.Bottomley@hansenpartnership.com> wrote:
>
> On Mon, 2025-08-11 at 14:46 -0700, Paul E. McKenney wrote:
> > On Fri, Aug 08, 2025 at 10:31:27AM +0200, Krzysztof Kozlowski wrote:
> > > On 05/08/2025 19:50, Sasha Levin wrote:
> > > > On Tue, Aug 05, 2025 at 05:38:36PM +0200, Jiri Kosina wrote:
> > > > > This proposal is pretty much followup/spinoff of the discussion
> > > > > currently happening on LKML in one of the sub-threads of [1].
> > > > >
> > > > > This is not really about legal aspects of AI-generated code and
> > > > > patches, I believe that'd be handled well handled well by LF,
> > > > > DCO, etc.
> > > > >
> > > > > My concern here is more "human to human", as in "if I need to
> > > > > talk to a human that actually does understand the patch deeply
> > > > > enough, in context, etc .. who is that?"
> > > > >
> > > > > I believe we need to at least settle on (and document) the way
> > > > > how to express in patch (meta)data:
> > > > >
> > > > > - this patch has been assisted by LLM $X
> > > > > - the human understanding the generated code is $Y
> > > > >
> > > > > We might just implicitly assume this to be the first person in
> > > > > the S-O-B chain (which I personally don't think works for all
> > > > > scenarios, you can have multiple people working on it, etc),
> > > > > but even in such case I believe this needs to be clearly
> > > > > documented.
> > > >
> > > > The above isn't really an AI problem though.
> > > >
> > > > We already have folks sending "checkpatch fixes" which only make
> > > > code less readable or "syzbot fixes" that shut up the warnings
> > > > but are completely bogus otherwise.
> > > >
> > > > Sure, folks sending "AI fixes" could (will?) be a growing
> > > > problem, but tackling just the AI side of it is addressing one of
> > > > the symptoms, not the underlying issue.
> > >
> > > I think there is a important difference in process and in result
> > > between using existing tools, like coccinelle, sparse or even
> > > checkpatch, and AI-assisted coding.
> > >
> > > For the first you still need to write actual code and since you are
> > > writing it, most likely you will compile it. Even if people fix the
> > > warnings, not the problems, they still at least write the code and
> > > thus this filters at least people who never wrote C.
> > >
> > > With AI you do not have to even write it. It will hallucinate,
> > > create some sort of C code and you just send it. No need to compile
> > > it even!
> >
> > Completely agreed, and furthermore, depending on how that AI was
> > trained, those using that AI's output might have some difficulty
> > meeting the requirements of the second portion of clause (a) of
> > Developer's Certificate of Origin (DCO) 1.1: "I have the right to
> > submit it under the open source license indicated in the file".
>
> Just on the legality of this.  Under US Law, provided the output isn't
> a derivative work (and all the suits over training data have so far
> failed to prove that it is), copyright in an AI created piece of code,
> actually doesn't exist because a non human entity can't legally hold
> copyright of a work.  The US copyright office has actually issued this
> opinion (huge 3 volume report):
>
> https://www.copyright.gov/ai/
>
> But amazingly enough congress has a more succinct summary:
>
> https://www.congress.gov/crs-product/LSB10922
>
> But the bottom line is that pure AI generated code is effectively
> uncopyrightable and therefore public domain which means anyone
> definitely has the right to submit it to the kernel under the DCO.

Well, if it isn't copyrightable, then specicially it cannot be
submitted under the GPLv2 which is required for the kernel, isn't it?

Cheers, Rafael

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [MAINTAINERS SUMMIT] Annotating patches containing AI-assisted code
  2025-08-18 17:53         ` Rafael J. Wysocki
@ 2025-08-18 18:32           ` James Bottomley
  2025-08-19 15:14             ` Paul E. McKenney
  0 siblings, 1 reply; 97+ messages in thread
From: James Bottomley @ 2025-08-18 18:32 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: paulmck, Krzysztof Kozlowski, Sasha Levin, Jiri Kosina, ksummit

On August 18, 2025 6:53:22 PM GMT+01:00, "Rafael J. Wysocki" <rafael@kernel.org> wrote:
>On Tue, Aug 12, 2025 at 10:41 AM James Bottomley
><James.Bottomley@hansenpartnership.com> wrote:
[...]
>> But the bottom line is that pure AI generated code is effectively
>> uncopyrightable and therefore public domain which means anyone
>> definitely has the right to submit it to the kernel under the DCO.
>
>Well, if it isn't copyrightable, then specicially it cannot be
>submitted under the GPLv2 which is required for the kernel, isn't it?

No. Public domain code can be combined with any licence (including GPL) because it carries no incompatible obligations since it carries no obligations at all.  You can release public domain code under any licence, but you can't enforce the licence except on additions or modifications because the recipient could have obtained the original from the original obligation free source.

Regards,

James

-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [MAINTAINERS SUMMIT] Annotating patches containing AI-assisted code
  2025-08-12  8:38       ` James Bottomley
                           ` (2 preceding siblings ...)
  2025-08-18 17:53         ` Rafael J. Wysocki
@ 2025-08-18 19:13         ` Mauro Carvalho Chehab
  2025-08-18 19:19           ` Jiri Kosina
  3 siblings, 1 reply; 97+ messages in thread
From: Mauro Carvalho Chehab @ 2025-08-18 19:13 UTC (permalink / raw)
  To: James Bottomley
  Cc: paulmck, Krzysztof Kozlowski, Sasha Levin, Jiri Kosina, ksummit

Em Tue, 12 Aug 2025 09:38:12 +0100
James Bottomley <James.Bottomley@HansenPartnership.com> escreveu:

> On Mon, 2025-08-11 at 14:46 -0700, Paul E. McKenney wrote:
> > On Fri, Aug 08, 2025 at 10:31:27AM +0200, Krzysztof Kozlowski wrote:  
> > > On 05/08/2025 19:50, Sasha Levin wrote:  
> > > > On Tue, Aug 05, 2025 at 05:38:36PM +0200, Jiri Kosina wrote:  
> > > > > This proposal is pretty much followup/spinoff of the discussion
> > > > > currently happening on LKML in one of the sub-threads of [1].
> > > > > 
> > > > > This is not really about legal aspects of AI-generated code and
> > > > > patches, I believe that'd be handled well handled well by LF,
> > > > > DCO, etc.
> > > > > 
> > > > > My concern here is more "human to human", as in "if I need to
> > > > > talk to a human that actually does understand the patch deeply
> > > > > enough, in context, etc .. who is that?"
> > > > > 
> > > > > I believe we need to at least settle on (and document) the way
> > > > > how to express in patch (meta)data:
> > > > > 
> > > > > - this patch has been assisted by LLM $X
> > > > > - the human understanding the generated code is $Y
> > > > > 
> > > > > We might just implicitly assume this to be the first person in
> > > > > the S-O-B chain (which I personally don't think works for all
> > > > > scenarios, you can have multiple people working on it, etc),
> > > > > but even in such case I believe this needs to be clearly
> > > > > documented.  
> > > > 
> > > > The above isn't really an AI problem though.
> > > > 
> > > > We already have folks sending "checkpatch fixes" which only make
> > > > code less readable or "syzbot fixes" that shut up the warnings
> > > > but are completely bogus otherwise.
> > > > 
> > > > Sure, folks sending "AI fixes" could (will?) be a growing
> > > > problem, but tackling just the AI side of it is addressing one of
> > > > the symptoms, not the underlying issue.  
> > > 
> > > I think there is a important difference in process and in result
> > > between using existing tools, like coccinelle, sparse or even
> > > checkpatch, and AI-assisted coding.
> > > 
> > > For the first you still need to write actual code and since you are
> > > writing it, most likely you will compile it. Even if people fix the
> > > warnings, not the problems, they still at least write the code and
> > > thus this filters at least people who never wrote C.
> > > 
> > > With AI you do not have to even write it. It will hallucinate,
> > > create some sort of C code and you just send it. No need to compile
> > > it even!  
> > 
> > Completely agreed, and furthermore, depending on how that AI was
> > trained, those using that AI's output might have some difficulty
> > meeting the requirements of the second portion of clause (a) of
> > Developer's Certificate of Origin (DCO) 1.1: "I have the right to
> > submit it under the open source license indicated in the file".  
> 
> Just on the legality of this.  Under US Law, provided the output isn't
> a derivative work (and all the suits over training data have so far
> failed to prove that it is), copyright in an AI created piece of code,
> actually doesn't exist because a non human entity can't legally hold
> copyright of a work.  The US copyright office has actually issued this
> opinion (huge 3 volume report):
> 
> https://www.copyright.gov/ai/
> 
> But amazingly enough congress has a more succinct summary:
> 
> https://www.congress.gov/crs-product/LSB10922
> 
> But the bottom line is that pure AI generated code is effectively
> uncopyrightable and therefore public domain which means anyone
> definitely has the right to submit it to the kernel under the DCO.
> 
> I imagine this situation might be changed by legislation in the future
> when people want to monetize AI output, but such a change can't be
> retroactive, so for now we're OK legally to accept pure AI code with
> the signoff of the submitter (and whatever AI annotation tags we come
> up with).
> 
> Of course if you take AI output and modify it before submitting, then
> the modifications do have copyright (provided a human made them).

On my tests with AI, humans need to modify it anyway. It reminds me
the (not so) good old code generators we had in the past: AI-generated
code, even when it works, it usually have unneeded steps and other
caveats that require human interaction to clean it up and fix.

I got good results with AI for things like generating unit tests, but
once tests are generated, still 50%-60% of them fails because AI
did stupid things, like not counting whitespaces right, and even
sometimes forgetting parameters and arguments.

From several aspects, it looks like contact a very junior intern
that knows a programming language and code really fast, but it has
no glue about how to generate a production quality level code.

After dozens of interactions, the code can be used as the bases for
a senior professional to modify it and have something ready for
merging.

The net result is that:

1. AI alone doesn't produce a ready-to-merge code;
2. Lots of refinement requirements made by humans to shape the code 
   into something that actually works;
3. During AI interaction, human has to intervene several times to
   avoid AI to hallucinate. Sometimes, it also has to close the
   chat and open again - or even use a different LLM model when
   AI can't converge;
4. At best scenario, human still needs to read the code and carefully
   modify for it to make sense; at worse, it has to write its own
   code, eventually using some suggestions from the AI hallucination.

Heh, there are exceptions: if one asks AI to produce a hello world
code (or something that "plays by the book" - e.g. when AI can use
thousands of references from code in public domain) the code is not 
that bad: it is just a variant of some public domain code.

Thanks,
Mauro

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [MAINTAINERS SUMMIT] Annotating patches containing AI-assisted code
  2025-08-18 19:13         ` Mauro Carvalho Chehab
@ 2025-08-18 19:19           ` Jiri Kosina
  2025-08-18 19:44             ` Rafael J. Wysocki
  0 siblings, 1 reply; 97+ messages in thread
From: Jiri Kosina @ 2025-08-18 19:19 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: James Bottomley, paulmck, Krzysztof Kozlowski, Sasha Levin, ksummit

On Mon, 18 Aug 2025, Mauro Carvalho Chehab wrote:

> On my tests with AI, humans need to modify it anyway. It reminds me
> the (not so) good old code generators we had in the past: AI-generated
> code, even when it works, it usually have unneeded steps and other
> caveats that require human interaction to clean it up and fix.
> 
> I got good results with AI for things like generating unit tests, but
> once tests are generated, still 50%-60% of them fails because AI
> did stupid things, like not counting whitespaces right, and even
> sometimes forgetting parameters and arguments.
> 
> From several aspects, it looks like contact a very junior intern
> that knows a programming language and code really fast, but it has
> no glue about how to generate a production quality level code.
> 
> After dozens of interactions, the code can be used as the bases for
> a senior professional to modify it and have something ready for
> merging.
> 
> The net result is that:
> 
> 1. AI alone doesn't produce a ready-to-merge code;
> 2. Lots of refinement requirements made by humans to shape the code 
>    into something that actually works;
> 3. During AI interaction, human has to intervene several times to
>    avoid AI to hallucinate. Sometimes, it also has to close the
>    chat and open again - or even use a different LLM model when
>    AI can't converge;
> 4. At best scenario, human still needs to read the code and carefully
>    modify for it to make sense; at worse, it has to write its own
>    code, eventually using some suggestions from the AI hallucination.

And the point is -- this all has now been much more easily available, and 
the increase pressure on maintainers is inevitable (pretty much everybody 
is now capable of submitting OK-ish looking code), so the 
submitter/maintainer ratio might become very unfair/unbalanced.

Hence the need (I believe) to require proper annotation, even with all the 
legal aspect aside.

Thanks,

-- 
Jiri Kosina
SUSE Labs


^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [MAINTAINERS SUMMIT] Annotating patches containing AI-assisted code
  2025-08-18 19:19           ` Jiri Kosina
@ 2025-08-18 19:44             ` Rafael J. Wysocki
  2025-08-18 19:47               ` Jiri Kosina
  0 siblings, 1 reply; 97+ messages in thread
From: Rafael J. Wysocki @ 2025-08-18 19:44 UTC (permalink / raw)
  To: Jiri Kosina
  Cc: Mauro Carvalho Chehab, James Bottomley, paulmck,
	Krzysztof Kozlowski, Sasha Levin, ksummit

On Mon, Aug 18, 2025 at 9:19 PM Jiri Kosina <jikos@kernel.org> wrote:
>
> On Mon, 18 Aug 2025, Mauro Carvalho Chehab wrote:
>
> > On my tests with AI, humans need to modify it anyway. It reminds me
> > the (not so) good old code generators we had in the past: AI-generated
> > code, even when it works, it usually have unneeded steps and other
> > caveats that require human interaction to clean it up and fix.
> >
> > I got good results with AI for things like generating unit tests, but
> > once tests are generated, still 50%-60% of them fails because AI
> > did stupid things, like not counting whitespaces right, and even
> > sometimes forgetting parameters and arguments.
> >
> > From several aspects, it looks like contact a very junior intern
> > that knows a programming language and code really fast, but it has
> > no glue about how to generate a production quality level code.
> >
> > After dozens of interactions, the code can be used as the bases for
> > a senior professional to modify it and have something ready for
> > merging.
> >
> > The net result is that:
> >
> > 1. AI alone doesn't produce a ready-to-merge code;
> > 2. Lots of refinement requirements made by humans to shape the code
> >    into something that actually works;
> > 3. During AI interaction, human has to intervene several times to
> >    avoid AI to hallucinate. Sometimes, it also has to close the
> >    chat and open again - or even use a different LLM model when
> >    AI can't converge;
> > 4. At best scenario, human still needs to read the code and carefully
> >    modify for it to make sense; at worse, it has to write its own
> >    code, eventually using some suggestions from the AI hallucination.
>
> And the point is -- this all has now been much more easily available, and
> the increase pressure on maintainers is inevitable (pretty much everybody
> is now capable of submitting OK-ish looking code), so the
> submitter/maintainer ratio might become very unfair/unbalanced.
>
> Hence the need (I believe) to require proper annotation, even with all the
> legal aspect aside.

I tend to agree that such annotations might be useful as heads-up
markers for maintainers if nothing else, but what about missing
annotations?

Is there a generally feasible way to figure out that they are missing?
 And if that can be done, "suspicious" changes may as well be caught
this way, so why would the annotations be required after all?

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [MAINTAINERS SUMMIT] Annotating patches containing AI-assisted code
  2025-08-18 19:44             ` Rafael J. Wysocki
@ 2025-08-18 19:47               ` Jiri Kosina
  2025-08-18 22:44                 ` Laurent Pinchart
  0 siblings, 1 reply; 97+ messages in thread
From: Jiri Kosina @ 2025-08-18 19:47 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Mauro Carvalho Chehab, James Bottomley, paulmck,
	Krzysztof Kozlowski, Sasha Levin, ksummit

On Mon, 18 Aug 2025, Rafael J. Wysocki wrote:

> I tend to agree that such annotations might be useful as heads-up
> markers for maintainers if nothing else, but what about missing
> annotations?
> 
> Is there a generally feasible way to figure out that they are missing?

Maybe we can use some LLM to help us decide whether the code has been 
written by a human or LLM :P

>  And if that can be done, "suspicious" changes may as well be caught
> this way, so why would the annotations be required after all?

I am not sure whether we have more options than documenting this 
requirement, and then work with our usual tool, which is building trust 
(or lack of thereof) in the individual submitters.

-- 
Jiri Kosina
SUSE Labs


^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [MAINTAINERS SUMMIT] Annotating patches containing AI-assisted code
  2025-08-12 14:42         ` Paul E. McKenney
  2025-08-12 15:55           ` Laurent Pinchart
@ 2025-08-18 21:07           ` Mauro Carvalho Chehab
  2025-08-19 15:15             ` Paul E. McKenney
  2025-08-19 15:23             ` James Bottomley
  1 sibling, 2 replies; 97+ messages in thread
From: Mauro Carvalho Chehab @ 2025-08-18 21:07 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: James Bottomley, Krzysztof Kozlowski, Sasha Levin, Jiri Kosina, ksummit

Em Tue, 12 Aug 2025 07:42:21 -0700
"Paul E. McKenney" <paulmck@kernel.org> escreveu:

> On Tue, Aug 12, 2025 at 09:38:12AM +0100, James Bottomley wrote:
> > On Mon, 2025-08-11 at 14:46 -0700, Paul E. McKenney wrote:  
> > > On Fri, Aug 08, 2025 at 10:31:27AM +0200, Krzysztof Kozlowski wrote:  
> > > > On 05/08/2025 19:50, Sasha Levin wrote:  
> > > > > On Tue, Aug 05, 2025 at 05:38:36PM +0200, Jiri Kosina wrote:  
> > > > > > This proposal is pretty much followup/spinoff of the discussion
> > > > > > currently happening on LKML in one of the sub-threads of [1].
> > > > > > 
> > > > > > This is not really about legal aspects of AI-generated code and
> > > > > > patches, I believe that'd be handled well handled well by LF,
> > > > > > DCO, etc.
> > > > > > 
> > > > > > My concern here is more "human to human", as in "if I need to
> > > > > > talk to a human that actually does understand the patch deeply
> > > > > > enough, in context, etc .. who is that?"
> > > > > > 
> > > > > > I believe we need to at least settle on (and document) the way
> > > > > > how to express in patch (meta)data:
> > > > > > 
> > > > > > - this patch has been assisted by LLM $X
> > > > > > - the human understanding the generated code is $Y
> > > > > > 
> > > > > > We might just implicitly assume this to be the first person in
> > > > > > the S-O-B chain (which I personally don't think works for all
> > > > > > scenarios, you can have multiple people working on it, etc),
> > > > > > but even in such case I believe this needs to be clearly
> > > > > > documented.  
> > > > > 
> > > > > The above isn't really an AI problem though.
> > > > > 
> > > > > We already have folks sending "checkpatch fixes" which only make
> > > > > code less readable or "syzbot fixes" that shut up the warnings
> > > > > but are completely bogus otherwise.
> > > > > 
> > > > > Sure, folks sending "AI fixes" could (will?) be a growing
> > > > > problem, but tackling just the AI side of it is addressing one of
> > > > > the symptoms, not the underlying issue.  
> > > > 
> > > > I think there is a important difference in process and in result
> > > > between using existing tools, like coccinelle, sparse or even
> > > > checkpatch, and AI-assisted coding.
> > > > 
> > > > For the first you still need to write actual code and since you are
> > > > writing it, most likely you will compile it. Even if people fix the
> > > > warnings, not the problems, they still at least write the code and
> > > > thus this filters at least people who never wrote C.
> > > > 
> > > > With AI you do not have to even write it. It will hallucinate,
> > > > create some sort of C code and you just send it. No need to compile
> > > > it even!  
> > > 
> > > Completely agreed, and furthermore, depending on how that AI was
> > > trained, those using that AI's output might have some difficulty
> > > meeting the requirements of the second portion of clause (a) of
> > > Developer's Certificate of Origin (DCO) 1.1: "I have the right to
> > > submit it under the open source license indicated in the file".  
> > 
> > Just on the legality of this.  Under US Law, provided the output isn't
> > a derivative work (and all the suits over training data have so far
> > failed to prove that it is), copyright in an AI created piece of code,
> > actually doesn't exist because a non human entity can't legally hold
> > copyright of a work.  The US copyright office has actually issued this
> > opinion (huge 3 volume report):
> > 
> > https://www.copyright.gov/ai/
> > 
> > But amazingly enough congress has a more succinct summary:
> > 
> > https://www.congress.gov/crs-product/LSB10922  
> 
> Indeed:
> 
> 	While the Constitution and Copyright Act do not explicitly define
> 	who (or what) may be an "author," U.S. courts to date have not
> 	recognized copyright in works that lack a human author—including
> 	works created autonomously by AI systems.
> 
> Please note the "U.S. courts *to* *date*".  :-(
> 
> > But the bottom line is that pure AI generated code is effectively
> > uncopyrightable and therefore public domain which means anyone
> > definitely has the right to submit it to the kernel under the DCO.
> > 
> > I imagine this situation might be changed by legislation in the future
> > when people want to monetize AI output, but such a change can't be
> > retroactive, so for now we're OK legally to accept pure AI code with
> > the signoff of the submitter (and whatever AI annotation tags we come
> > up with).  
> 
> Except that the USA is a case-law jurisdiction, and changes
> in interpretation of existing laws can be and have been applied
> retroactively, give or take things like statutes of limitations.  And we
> need to worry about more than just USA law.
> 
> And I do agree that many of the lawsuits seem to be motivated by an
> overwhelmening desire to monetize the output of AI that was induced by
> someone else's prompts, if that is what you are getting at.  It does seem
> to me personally that after you have sliced and diced the training data,
> fair use should apply, but last I checked, fair use was a USA-only thing.

Maybe, but other Countries have similar concepts. I remember I saw an
interpretation of the Brazilian copyright law once from a famous layer
at property rights matter, stating that reproducing small parts of a book, 
for instance, could be ok, under certain circumstances (in a concept
similar to US fair use).


Thanks,
Mauro

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [MAINTAINERS SUMMIT] Annotating patches containing AI-assisted code
  2025-08-12 13:15         ` Bird, Tim
  2025-08-12 14:31           ` Greg KH
@ 2025-08-18 21:12           ` Mauro Carvalho Chehab
  2025-08-19 15:01             ` Paul E. McKenney
  1 sibling, 1 reply; 97+ messages in thread
From: Mauro Carvalho Chehab @ 2025-08-18 21:12 UTC (permalink / raw)
  To: Bird, Tim
  Cc: James Bottomley, paulmck, Krzysztof Kozlowski, Sasha Levin,
	Jiri Kosina, ksummit

Em Tue, 12 Aug 2025 13:15:33 +0000
"Bird, Tim" <Tim.Bird@sony.com> escreveu:

> > -----Original Message-----
> > From: James Bottomley <James.Bottomley@HansenPartnership.com>
> > On Mon, 2025-08-11 at 14:46 -0700, Paul E. McKenney wrote:  
> > > On Fri, Aug 08, 2025 at 10:31:27AM +0200, Krzysztof Kozlowski wrote:  
> > > > On 05/08/2025 19:50, Sasha Levin wrote:  
> > > > > On Tue, Aug 05, 2025 at 05:38:36PM +0200, Jiri Kosina wrote:  
> > > > > > This proposal is pretty much followup/spinoff of the discussion
> > > > > > currently happening on LKML in one of the sub-threads of [1].
> > > > > >
> > > > > > This is not really about legal aspects of AI-generated code and
> > > > > > patches, I believe that'd be handled well handled well by LF,
> > > > > > DCO, etc.
> > > > > >
> > > > > > My concern here is more "human to human", as in "if I need to
> > > > > > talk to a human that actually does understand the patch deeply
> > > > > > enough, in context, etc .. who is that?"
> > > > > >
> > > > > > I believe we need to at least settle on (and document) the way
> > > > > > how to express in patch (meta)data:
> > > > > >
> > > > > > - this patch has been assisted by LLM $X
> > > > > > - the human understanding the generated code is $Y
> > > > > >
> > > > > > We might just implicitly assume this to be the first person in
> > > > > > the S-O-B chain (which I personally don't think works for all
> > > > > > scenarios, you can have multiple people working on it, etc),
> > > > > > but even in such case I believe this needs to be clearly
> > > > > > documented.  
> > > > >
> > > > > The above isn't really an AI problem though.
> > > > >
> > > > > We already have folks sending "checkpatch fixes" which only make
> > > > > code less readable or "syzbot fixes" that shut up the warnings
> > > > > but are completely bogus otherwise.
> > > > >
> > > > > Sure, folks sending "AI fixes" could (will?) be a growing
> > > > > problem, but tackling just the AI side of it is addressing one of
> > > > > the symptoms, not the underlying issue.  
> > > >
> > > > I think there is a important difference in process and in result
> > > > between using existing tools, like coccinelle, sparse or even
> > > > checkpatch, and AI-assisted coding.
> > > >
> > > > For the first you still need to write actual code and since you are
> > > > writing it, most likely you will compile it. Even if people fix the
> > > > warnings, not the problems, they still at least write the code and
> > > > thus this filters at least people who never wrote C.
> > > >
> > > > With AI you do not have to even write it. It will hallucinate,
> > > > create some sort of C code and you just send it. No need to compile
> > > > it even!  
> > >
> > > Completely agreed, and furthermore, depending on how that AI was
> > > trained, those using that AI's output might have some difficulty
> > > meeting the requirements of the second portion of clause (a) of
> > > Developer's Certificate of Origin (DCO) 1.1: "I have the right to
> > > submit it under the open source license indicated in the file".  
> > 
> > Just on the legality of this.  Under US Law, provided the output isn't
> > a derivative work (and all the suits over training data have so far
> > failed to prove that it is),  
> 
> This is indeed so.  I have followed the GitHub copilot litigation
> (see https://githubcopilotlitigation.com/case-updates.html), and a few
> other cases related to whether AI output violates the copyright of the training
> data (that is, is a form of derivative work).  I'm not a lawyer, but the legal
> reasoning for judgements passed down so far have been, IMHO, atrocious.
> Some claims have been thrown out because the output was not identical
> to the training data (even when things like comments from the code in
> the training data were copied verbatim into the output).  Companies doing
> AI code generation now scrub their outputs to make sure nothing
> in the output is identical to material in the training data.  However, I'm not
> sure this is enough, and this requirement for identicality (to prove derivative work)
> is problematic, when copyright law only requires proof of substantial similarity.
> 
> The copilot case is going through appeal now, and I wouldn't bet on which
> way the outcome will drop.  It could very well yet result that AI output is deemed
> to be derivative work of the training data in some cases.  If that occurs, then even restricting
> training data to GPL code wouldn't be a sufficient workaround to enable using the AI output
> in the kernel.  And, as has been stated elsewhere, there are no currently no major models restricting
> their code training data to permissively licensed code.  This makes it infeasible to use
> any of the popular models with a high degree of certainty that the output is legally OK.
> 
> No legal pun intended, but I think the jury is still out on this issue, and I think it
> would be wise to be EXTREMELY cautious introducing AI-generated code into the kernel.
> I personally would not submit something for inclusion into the kernel proper that
> was AI-generated.  Generation of tools or tests is, IMO, a different matter and I'm
> less concerned about that.
> 
> Getting back to the discussion at hand, I believe that annotating that a contribution was
> AI-generated (or that AI was involved) will at least give us some assistance to re-review
> the code and possibly remove or replace it should the legal status of AI-generated code
> become problematic in the future.

Heh, it could produce exactly the opposite effect: anyone that may have
a code that slightly resembles a patch stating that AI was used could try
to monetize from such patch merge.

> 
> There is also value in flagging that additional scrutiny may be warranted
> at the time of submission.  So I like the idea in principal.


Thanks,
Mauro

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [MAINTAINERS SUMMIT] Annotating patches containing AI-assisted code
  2025-08-11 22:51         ` Paul E. McKenney
  2025-08-11 23:22           ` Luis Chamberlain
  2025-08-12 16:01           ` Steven Rostedt
@ 2025-08-18 21:23           ` Mauro Carvalho Chehab
  2025-08-19 15:25             ` Paul E. McKenney
  2 siblings, 1 reply; 97+ messages in thread
From: Mauro Carvalho Chehab @ 2025-08-18 21:23 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Luis Chamberlain, Krzysztof Kozlowski, Sasha Levin, Jiri Kosina, ksummit

Em Mon, 11 Aug 2025 15:51:48 -0700
"Paul E. McKenney" <paulmck@kernel.org> escreveu:

> On Mon, Aug 11, 2025 at 03:11:47PM -0700, Luis Chamberlain wrote:
> > On Mon, Aug 11, 2025 at 02:46:11PM -0700, Paul E. McKenney wrote:  
> > > depending on how that AI was
> > > trained, those using that AI's output might have some difficulty meeting
> > > the requirements of the second portion of clause (a) of Developer's
> > > Certificate of Origin (DCO) 1.1: "I have the right to submit it under
> > > the open source license indicated in the file".  
> > 
> > If the argument is that cetain LLM generated code cannot be used for code under
> > the DCO, then:
> > 
> > a) isn't this debatable? Do we want to itemize a safe list for AI models
> >    which we think are safe to adopt for AI generated code?  
> 
> For my own work, I will continue to avoid use of AI-generated artifacts
> for open-source software projects unless and until some of the more
> consequential "debates" are resolved favorably.
> 
> > b) seems kind of too late  
> 
> Why?
> 
> > c) If something like the Generated-by tag is used, and we trust it, then
> >    if we do want to side against merging AI generated code, that's perhaps our
> >    only chance at blocking that type of code. Its however not bullet proof.  
> 
> Nothing is bullet proof.  ;-)

Let's face reality: before AI generation, more than one time I
received completely identical patches from different developers
with exactly the same content. Sometimes, even the descriptions
were similar. I got one or twice the same description even.

Granted, those are bug fixes for obvious fixes (usually one liners), but
the point is: there are certain software patterns that are so common 
that there are lots of developers around the globe whose are familiar
with. This is not different from a AI: if one asks it to write a DPS code 
in some language (C, C++, Python, you name it), I bet the code will be
at least 90% similar to any other code you or anyone else would write.

The rationale is that we're all trained directly or indirectly
(including AI) with the same textbook algorithms or from someone
that used such textbooks.

I can't see AI making it any better or worse from what we already
have.

Thanks,
Mauro

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [MAINTAINERS SUMMIT] Annotating patches containing AI-assisted code
  2025-08-11 23:22           ` Luis Chamberlain
  2025-08-11 23:42             ` Paul E. McKenney
@ 2025-08-18 21:41             ` Mauro Carvalho Chehab
  2025-08-20 21:48               ` Paul E. McKenney
  1 sibling, 1 reply; 97+ messages in thread
From: Mauro Carvalho Chehab @ 2025-08-18 21:41 UTC (permalink / raw)
  To: Luis Chamberlain
  Cc: Paul E. McKenney, Krzysztof Kozlowski, Sasha Levin, Jiri Kosina, ksummit

Em Mon, 11 Aug 2025 16:22:21 -0700
Luis Chamberlain <mcgrof@kernel.org> escreveu:

> On Mon, Aug 11, 2025 at 03:51:48PM -0700, Paul E. McKenney wrote:
> > On Mon, Aug 11, 2025 at 03:11:47PM -0700, Luis Chamberlain wrote:  
> > > b) seems kind of too late  
> > 
> > Why?  
> 
> One cannot assume at this point AI generated code has not been merged
> into any large scale open source project.
> 
> I am also not sure it can be stopped.

Agreed. The same applies to all other patches: nobody can really tell if
some code could potentially contain code not developed by the submitter.

To be frank, considering that most companies nowadays have policies of
not using public AI for private code, I suspect that AI generated code
contains only public domain code or open source. As open source licenses
explicitly allow one to learn from the written code, except if AI
(and the developer using it) are just copying the code, it will very
likely be at the already allowed open source license clauses.

Now, when someone from a company submits a patch for the company
hardware, for instance, it is a lot harder for a maintainer to be
sure that such submission was approved. The SoB is a sort of
protection for us, as the submitter declared that he had the
permissions.

So, at least from my side, provided that the patch is good(*), I'm not
concerned if it used AI to help him or not.

(*) good patch means that, even if AI was used, a human adjusted
    it to ensure its quality.

Thanks,
Mauro

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [MAINTAINERS SUMMIT] Annotating patches containing AI-assisted code
  2025-08-18 19:47               ` Jiri Kosina
@ 2025-08-18 22:44                 ` Laurent Pinchart
  0 siblings, 0 replies; 97+ messages in thread
From: Laurent Pinchart @ 2025-08-18 22:44 UTC (permalink / raw)
  To: Jiri Kosina
  Cc: Rafael J. Wysocki, Mauro Carvalho Chehab, James Bottomley,
	paulmck, Krzysztof Kozlowski, Sasha Levin, ksummit

On Mon, Aug 18, 2025 at 09:47:04PM +0200, Jiri Kosina wrote:
> On Mon, 18 Aug 2025, Rafael J. Wysocki wrote:
> 
> > I tend to agree that such annotations might be useful as heads-up
> > markers for maintainers if nothing else, but what about missing
> > annotations?
> > 
> > Is there a generally feasible way to figure out that they are missing?
> 
> Maybe we can use some LLM to help us decide whether the code has been 
> written by a human or LLM :P
> 
> >  And if that can be done, "suspicious" changes may as well be caught
> > this way, so why would the annotations be required after all?
> 
> I am not sure whether we have more options than documenting this 
> requirement, and then work with our usual tool, which is building trust 
> (or lack of thereof) in the individual submitters.

At this point, I would expect contributors to not mention that code has
been generated by an LLM for three reasons, in decreasing order of
frequency:

- Because we don't tell them they need to (that's the current situation)
- Becuase they don't know they need to (people don't read documentation,
  so we'll have to find ways to get the message through)
- Because they maliciously decide to breach the rule

I'm not too concerned about the third reason for the time being, as we
have way more developers acting in good faith than bad faith. If that
changes, we'll have to figure out how to handle the problem.

-- 
Regards,

Laurent Pinchart

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [MAINTAINERS SUMMIT] Annotating patches containing AI-assisted code
  2025-08-18 21:12           ` Mauro Carvalho Chehab
@ 2025-08-19 15:01             ` Paul E. McKenney
  0 siblings, 0 replies; 97+ messages in thread
From: Paul E. McKenney @ 2025-08-19 15:01 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Bird, Tim, James Bottomley, Krzysztof Kozlowski, Sasha Levin,
	Jiri Kosina, ksummit

On Mon, Aug 18, 2025 at 11:12:23PM +0200, Mauro Carvalho Chehab wrote:
> Em Tue, 12 Aug 2025 13:15:33 +0000
> "Bird, Tim" <Tim.Bird@sony.com> escreveu:
> 
> > > -----Original Message-----
> > > From: James Bottomley <James.Bottomley@HansenPartnership.com>
> > > On Mon, 2025-08-11 at 14:46 -0700, Paul E. McKenney wrote:  
> > > > On Fri, Aug 08, 2025 at 10:31:27AM +0200, Krzysztof Kozlowski wrote:  
> > > > > On 05/08/2025 19:50, Sasha Levin wrote:  
> > > > > > On Tue, Aug 05, 2025 at 05:38:36PM +0200, Jiri Kosina wrote:  
> > > > > > > This proposal is pretty much followup/spinoff of the discussion
> > > > > > > currently happening on LKML in one of the sub-threads of [1].
> > > > > > >
> > > > > > > This is not really about legal aspects of AI-generated code and
> > > > > > > patches, I believe that'd be handled well handled well by LF,
> > > > > > > DCO, etc.
> > > > > > >
> > > > > > > My concern here is more "human to human", as in "if I need to
> > > > > > > talk to a human that actually does understand the patch deeply
> > > > > > > enough, in context, etc .. who is that?"
> > > > > > >
> > > > > > > I believe we need to at least settle on (and document) the way
> > > > > > > how to express in patch (meta)data:
> > > > > > >
> > > > > > > - this patch has been assisted by LLM $X
> > > > > > > - the human understanding the generated code is $Y
> > > > > > >
> > > > > > > We might just implicitly assume this to be the first person in
> > > > > > > the S-O-B chain (which I personally don't think works for all
> > > > > > > scenarios, you can have multiple people working on it, etc),
> > > > > > > but even in such case I believe this needs to be clearly
> > > > > > > documented.  
> > > > > >
> > > > > > The above isn't really an AI problem though.
> > > > > >
> > > > > > We already have folks sending "checkpatch fixes" which only make
> > > > > > code less readable or "syzbot fixes" that shut up the warnings
> > > > > > but are completely bogus otherwise.
> > > > > >
> > > > > > Sure, folks sending "AI fixes" could (will?) be a growing
> > > > > > problem, but tackling just the AI side of it is addressing one of
> > > > > > the symptoms, not the underlying issue.  
> > > > >
> > > > > I think there is a important difference in process and in result
> > > > > between using existing tools, like coccinelle, sparse or even
> > > > > checkpatch, and AI-assisted coding.
> > > > >
> > > > > For the first you still need to write actual code and since you are
> > > > > writing it, most likely you will compile it. Even if people fix the
> > > > > warnings, not the problems, they still at least write the code and
> > > > > thus this filters at least people who never wrote C.
> > > > >
> > > > > With AI you do not have to even write it. It will hallucinate,
> > > > > create some sort of C code and you just send it. No need to compile
> > > > > it even!  
> > > >
> > > > Completely agreed, and furthermore, depending on how that AI was
> > > > trained, those using that AI's output might have some difficulty
> > > > meeting the requirements of the second portion of clause (a) of
> > > > Developer's Certificate of Origin (DCO) 1.1: "I have the right to
> > > > submit it under the open source license indicated in the file".  
> > > 
> > > Just on the legality of this.  Under US Law, provided the output isn't
> > > a derivative work (and all the suits over training data have so far
> > > failed to prove that it is),  
> > 
> > This is indeed so.  I have followed the GitHub copilot litigation
> > (see https://githubcopilotlitigation.com/case-updates.html), and a few
> > other cases related to whether AI output violates the copyright of the training
> > data (that is, is a form of derivative work).  I'm not a lawyer, but the legal
> > reasoning for judgements passed down so far have been, IMHO, atrocious.
> > Some claims have been thrown out because the output was not identical
> > to the training data (even when things like comments from the code in
> > the training data were copied verbatim into the output).  Companies doing
> > AI code generation now scrub their outputs to make sure nothing
> > in the output is identical to material in the training data.  However, I'm not
> > sure this is enough, and this requirement for identicality (to prove derivative work)
> > is problematic, when copyright law only requires proof of substantial similarity.
> > 
> > The copilot case is going through appeal now, and I wouldn't bet on which
> > way the outcome will drop.  It could very well yet result that AI output is deemed
> > to be derivative work of the training data in some cases.  If that occurs, then even restricting
> > training data to GPL code wouldn't be a sufficient workaround to enable using the AI output
> > in the kernel.  And, as has been stated elsewhere, there are no currently no major models restricting
> > their code training data to permissively licensed code.  This makes it infeasible to use
> > any of the popular models with a high degree of certainty that the output is legally OK.
> > 
> > No legal pun intended, but I think the jury is still out on this issue, and I think it
> > would be wise to be EXTREMELY cautious introducing AI-generated code into the kernel.
> > I personally would not submit something for inclusion into the kernel proper that
> > was AI-generated.  Generation of tools or tests is, IMO, a different matter and I'm
> > less concerned about that.
> > 
> > Getting back to the discussion at hand, I believe that annotating that a contribution was
> > AI-generated (or that AI was involved) will at least give us some assistance to re-review
> > the code and possibly remove or replace it should the legal status of AI-generated code
> > become problematic in the future.
> 
> Heh, it could produce exactly the opposite effect: anyone that may have
> a code that slightly resembles a patch stating that AI was used could try
> to monetize from such patch merge.

This is one of my concerns as well.

							Thanx, Paul

> > There is also value in flagging that additional scrutiny may be warranted
> > at the time of submission.  So I like the idea in principal.
> 
> 
> Thanks,
> Mauro

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [MAINTAINERS SUMMIT] Annotating patches containing AI-assisted code
  2025-08-18 18:32           ` James Bottomley
@ 2025-08-19 15:14             ` Paul E. McKenney
  0 siblings, 0 replies; 97+ messages in thread
From: Paul E. McKenney @ 2025-08-19 15:14 UTC (permalink / raw)
  To: James Bottomley
  Cc: Rafael J. Wysocki, Krzysztof Kozlowski, Sasha Levin, Jiri Kosina,
	ksummit

On Mon, Aug 18, 2025 at 07:32:26PM +0100, James Bottomley wrote:
> On August 18, 2025 6:53:22 PM GMT+01:00, "Rafael J. Wysocki" <rafael@kernel.org> wrote:
> >On Tue, Aug 12, 2025 at 10:41 AM James Bottomley
> ><James.Bottomley@hansenpartnership.com> wrote:
> [...]
> >> But the bottom line is that pure AI generated code is effectively
> >> uncopyrightable and therefore public domain which means anyone
> >> definitely has the right to submit it to the kernel under the DCO.

I sympathize with this argument, and I hope that it prevails.  But there
is no guarantee that it will do so.

I mean, sure, there is precedent going back centuries that a given human
being can ingest large quantities of copyrighted material, and generate
a work that *by* *default* has no copyright connection to any of the
ingested material.  And sure, there is also less-well-established but
still good reason to believe that only human beings can hold copyright.
And putting those two together would give your "bottom line", that the
output of an AI is in public domain, just like that famous simian selfie.
(Of course, that "by default" is subject to plaigarism tests.)

But this argument already assumes that human beings are special, which
might or might not augur well for the argument that AI-generated output
based on copyrighted input should be treated the same as is similar
human-generated output.

Again, I sympathize with your position and I hope that it proves to be
correct, but I don't see that we are there yet, if in fact we ever get
there at all.

Or do you have a public statement from (say) a Linux Foundation attorney
that we can rely on?

> >Well, if it isn't copyrightable, then specicially it cannot be
> >submitted under the GPLv2 which is required for the kernel, isn't it?
> 
> No. Public domain code can be combined with any licence (including GPL) because it carries no incompatible obligations since it carries no obligations at all.  You can release public domain code under any licence, but you can't enforce the licence except on additions or modifications because the recipient could have obtained the original from the original obligation free source.

But I do agree that public-domain code can be combined with GPLv2 code.
At least assuming that we maintain a sufficient paper trail back to the
original public-domain code.

							Thanx, Paul

> Regards,
> 
> James
> 
> -- 
> Sent from my Android device with K-9 Mail. Please excuse my brevity.

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [MAINTAINERS SUMMIT] Annotating patches containing AI-assisted code
  2025-08-18 21:07           ` Mauro Carvalho Chehab
@ 2025-08-19 15:15             ` Paul E. McKenney
  2025-08-19 15:23             ` James Bottomley
  1 sibling, 0 replies; 97+ messages in thread
From: Paul E. McKenney @ 2025-08-19 15:15 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: James Bottomley, Krzysztof Kozlowski, Sasha Levin, Jiri Kosina, ksummit

On Mon, Aug 18, 2025 at 11:07:29PM +0200, Mauro Carvalho Chehab wrote:
> Em Tue, 12 Aug 2025 07:42:21 -0700
> "Paul E. McKenney" <paulmck@kernel.org> escreveu:
> 
> > On Tue, Aug 12, 2025 at 09:38:12AM +0100, James Bottomley wrote:
> > > On Mon, 2025-08-11 at 14:46 -0700, Paul E. McKenney wrote:  
> > > > On Fri, Aug 08, 2025 at 10:31:27AM +0200, Krzysztof Kozlowski wrote:  
> > > > > On 05/08/2025 19:50, Sasha Levin wrote:  
> > > > > > On Tue, Aug 05, 2025 at 05:38:36PM +0200, Jiri Kosina wrote:  
> > > > > > > This proposal is pretty much followup/spinoff of the discussion
> > > > > > > currently happening on LKML in one of the sub-threads of [1].
> > > > > > > 
> > > > > > > This is not really about legal aspects of AI-generated code and
> > > > > > > patches, I believe that'd be handled well handled well by LF,
> > > > > > > DCO, etc.
> > > > > > > 
> > > > > > > My concern here is more "human to human", as in "if I need to
> > > > > > > talk to a human that actually does understand the patch deeply
> > > > > > > enough, in context, etc .. who is that?"
> > > > > > > 
> > > > > > > I believe we need to at least settle on (and document) the way
> > > > > > > how to express in patch (meta)data:
> > > > > > > 
> > > > > > > - this patch has been assisted by LLM $X
> > > > > > > - the human understanding the generated code is $Y
> > > > > > > 
> > > > > > > We might just implicitly assume this to be the first person in
> > > > > > > the S-O-B chain (which I personally don't think works for all
> > > > > > > scenarios, you can have multiple people working on it, etc),
> > > > > > > but even in such case I believe this needs to be clearly
> > > > > > > documented.  
> > > > > > 
> > > > > > The above isn't really an AI problem though.
> > > > > > 
> > > > > > We already have folks sending "checkpatch fixes" which only make
> > > > > > code less readable or "syzbot fixes" that shut up the warnings
> > > > > > but are completely bogus otherwise.
> > > > > > 
> > > > > > Sure, folks sending "AI fixes" could (will?) be a growing
> > > > > > problem, but tackling just the AI side of it is addressing one of
> > > > > > the symptoms, not the underlying issue.  
> > > > > 
> > > > > I think there is a important difference in process and in result
> > > > > between using existing tools, like coccinelle, sparse or even
> > > > > checkpatch, and AI-assisted coding.
> > > > > 
> > > > > For the first you still need to write actual code and since you are
> > > > > writing it, most likely you will compile it. Even if people fix the
> > > > > warnings, not the problems, they still at least write the code and
> > > > > thus this filters at least people who never wrote C.
> > > > > 
> > > > > With AI you do not have to even write it. It will hallucinate,
> > > > > create some sort of C code and you just send it. No need to compile
> > > > > it even!  
> > > > 
> > > > Completely agreed, and furthermore, depending on how that AI was
> > > > trained, those using that AI's output might have some difficulty
> > > > meeting the requirements of the second portion of clause (a) of
> > > > Developer's Certificate of Origin (DCO) 1.1: "I have the right to
> > > > submit it under the open source license indicated in the file".  
> > > 
> > > Just on the legality of this.  Under US Law, provided the output isn't
> > > a derivative work (and all the suits over training data have so far
> > > failed to prove that it is), copyright in an AI created piece of code,
> > > actually doesn't exist because a non human entity can't legally hold
> > > copyright of a work.  The US copyright office has actually issued this
> > > opinion (huge 3 volume report):
> > > 
> > > https://www.copyright.gov/ai/
> > > 
> > > But amazingly enough congress has a more succinct summary:
> > > 
> > > https://www.congress.gov/crs-product/LSB10922  
> > 
> > Indeed:
> > 
> > 	While the Constitution and Copyright Act do not explicitly define
> > 	who (or what) may be an "author," U.S. courts to date have not
> > 	recognized copyright in works that lack a human author—including
> > 	works created autonomously by AI systems.
> > 
> > Please note the "U.S. courts *to* *date*".  :-(
> > 
> > > But the bottom line is that pure AI generated code is effectively
> > > uncopyrightable and therefore public domain which means anyone
> > > definitely has the right to submit it to the kernel under the DCO.
> > > 
> > > I imagine this situation might be changed by legislation in the future
> > > when people want to monetize AI output, but such a change can't be
> > > retroactive, so for now we're OK legally to accept pure AI code with
> > > the signoff of the submitter (and whatever AI annotation tags we come
> > > up with).  
> > 
> > Except that the USA is a case-law jurisdiction, and changes
> > in interpretation of existing laws can be and have been applied
> > retroactively, give or take things like statutes of limitations.  And we
> > need to worry about more than just USA law.
> > 
> > And I do agree that many of the lawsuits seem to be motivated by an
> > overwhelmening desire to monetize the output of AI that was induced by
> > someone else's prompts, if that is what you are getting at.  It does seem
> > to me personally that after you have sliced and diced the training data,
> > fair use should apply, but last I checked, fair use was a USA-only thing.
> 
> Maybe, but other Countries have similar concepts. I remember I saw an
> interpretation of the Brazilian copyright law once from a famous layer
> at property rights matter, stating that reproducing small parts of a book, 
> for instance, could be ok, under certain circumstances (in a concept
> similar to US fair use).

Understood and agreed.  And in the worst case, this battle must be fought
separately in each legal jurisdiction.  I do hope that it does not come
to that, but...

							Thanx, Paul

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [MAINTAINERS SUMMIT] Annotating patches containing AI-assisted code
  2025-08-18 21:07           ` Mauro Carvalho Chehab
  2025-08-19 15:15             ` Paul E. McKenney
@ 2025-08-19 15:23             ` James Bottomley
  2025-08-19 16:16               ` Mauro Carvalho Chehab
  1 sibling, 1 reply; 97+ messages in thread
From: James Bottomley @ 2025-08-19 15:23 UTC (permalink / raw)
  To: Mauro Carvalho Chehab, Paul E. McKenney
  Cc: Krzysztof Kozlowski, Sasha Levin, Jiri Kosina, ksummit

On August 18, 2025 10:07:29 PM GMT+01:00, Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:
>Em Tue, 12 Aug 2025 07:42:21 -0700
>"Paul E. McKenney" <paulmck@kernel.org> escreveu:
[...]
> do agree that many of the lawsuits seem to be motivated by an
>> overwhelmening desire to monetize the output of AI that was induced by
>> someone else's prompts, if that is what you are getting at.  It does seem
>> to me personally that after you have sliced and diced the training data,
>> fair use should apply, but last I checked, fair use was a USA-only thing.
>
>Maybe, but other Countries have similar concepts. I remember I saw an
>interpretation of the Brazilian copyright law once from a famous layer
>at property rights matter, stating that reproducing small parts of a book, 
>for instance, could be ok, under certain circumstances (in a concept
>similar to US fair use).

Yes, technically.  Article 10 of the Berne convention contains a weaker concept allowing quotations without encumbrance based on a three prong test that the quote isn't extensive,  doesn't rob the rights holder of substantial royalties and doesn't unreasonably prejudice the existing copyright rights.

Regards,

James


-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [MAINTAINERS SUMMIT] Annotating patches containing AI-assisted code
  2025-08-18 21:23           ` Mauro Carvalho Chehab
@ 2025-08-19 15:25             ` Paul E. McKenney
  2025-08-19 16:27               ` Mauro Carvalho Chehab
  0 siblings, 1 reply; 97+ messages in thread
From: Paul E. McKenney @ 2025-08-19 15:25 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Luis Chamberlain, Krzysztof Kozlowski, Sasha Levin, Jiri Kosina, ksummit

On Mon, Aug 18, 2025 at 11:23:32PM +0200, Mauro Carvalho Chehab wrote:
> Em Mon, 11 Aug 2025 15:51:48 -0700
> "Paul E. McKenney" <paulmck@kernel.org> escreveu:
> 
> > On Mon, Aug 11, 2025 at 03:11:47PM -0700, Luis Chamberlain wrote:
> > > On Mon, Aug 11, 2025 at 02:46:11PM -0700, Paul E. McKenney wrote:  
> > > > depending on how that AI was
> > > > trained, those using that AI's output might have some difficulty meeting
> > > > the requirements of the second portion of clause (a) of Developer's
> > > > Certificate of Origin (DCO) 1.1: "I have the right to submit it under
> > > > the open source license indicated in the file".  
> > > 
> > > If the argument is that cetain LLM generated code cannot be used for code under
> > > the DCO, then:
> > > 
> > > a) isn't this debatable? Do we want to itemize a safe list for AI models
> > >    which we think are safe to adopt for AI generated code?  
> > 
> > For my own work, I will continue to avoid use of AI-generated artifacts
> > for open-source software projects unless and until some of the more
> > consequential "debates" are resolved favorably.
> > 
> > > b) seems kind of too late  
> > 
> > Why?
> > 
> > > c) If something like the Generated-by tag is used, and we trust it, then
> > >    if we do want to side against merging AI generated code, that's perhaps our
> > >    only chance at blocking that type of code. Its however not bullet proof.  
> > 
> > Nothing is bullet proof.  ;-)
> 
> Let's face reality: before AI generation, more than one time I
> received completely identical patches from different developers
> with exactly the same content. Sometimes, even the descriptions
> were similar. I got one or twice the same description even.

But of course.  And in at least some jurisdictions, one exception to
copyright is when there is only one way to express a given concept.

> Granted, those are bug fixes for obvious fixes (usually one liners), but
> the point is: there are certain software patterns that are so common 
> that there are lots of developers around the globe whose are familiar
> with. This is not different from a AI: if one asks it to write a DPS code 
> in some language (C, C++, Python, you name it), I bet the code will be
> at least 90% similar to any other code you or anyone else would write.
> 
> The rationale is that we're all trained directly or indirectly
> (including AI) with the same textbook algorithms or from someone
> that used such textbooks.

That may be true, but we should expect copyright law to continue to be
vigorously enforced from time to time.  Yes, I believe that the Linux
kernel community is a great group of people, but there is neverthelss
no shortage of people who would be happy to take legal action against
us if they thought doing so might benefit them.

> I can't see AI making it any better or worse from what we already
> have.

My assumption is that any time I ask an AI a question, neither the
question nor the answer is in any way private to me.  In contrast, as
far as I know, my own thoughts are private to me.  Yes, yes, give or take
facial expression, body language, pheromones, and similar, but I do not
believe even the best experts are going to deduce my technical innovations
from such clues.  Naive of me, perhaps, but that is my firm belief.  ;-)

That difference is highly nontrivial, and could quite possibly make
things far worse for us.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [MAINTAINERS SUMMIT] Annotating patches containing AI-assisted code
  2025-08-19 15:23             ` James Bottomley
@ 2025-08-19 16:16               ` Mauro Carvalho Chehab
  2025-08-20 21:44                 ` Paul E. McKenney
  0 siblings, 1 reply; 97+ messages in thread
From: Mauro Carvalho Chehab @ 2025-08-19 16:16 UTC (permalink / raw)
  To: James Bottomley
  Cc: Mauro Carvalho Chehab, Paul E. McKenney, Krzysztof Kozlowski,
	Sasha Levin, Jiri Kosina, ksummit

On Tue, Aug 19, 2025 at 04:23:46PM +0100, James Bottomley wrote:
> On August 18, 2025 10:07:29 PM GMT+01:00, Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:
> >Em Tue, 12 Aug 2025 07:42:21 -0700
> >"Paul E. McKenney" <paulmck@kernel.org> escreveu:
> [...]
> > do agree that many of the lawsuits seem to be motivated by an
> >> overwhelmening desire to monetize the output of AI that was induced by
> >> someone else's prompts, if that is what you are getting at.  It does seem
> >> to me personally that after you have sliced and diced the training data,
> >> fair use should apply, but last I checked, fair use was a USA-only thing.
> >
> >Maybe, but other Countries have similar concepts. I remember I saw an
> >interpretation of the Brazilian copyright law once from a famous layer
> >at property rights matter, stating that reproducing small parts of a book, 
> >for instance, could be ok, under certain circumstances (in a concept
> >similar to US fair use).
> 
> Yes, technically.  Article 10 of the Berne convention contains a weaker concept allowing quotations without encumbrance based on a three prong test that the quote isn't extensive,  doesn't rob the rights holder of substantial royalties and doesn't unreasonably prejudice the existing copyright rights.

Exactly. The interpretation from such speech I mentioned is based on that.
Now, exactly what is substantial is something that could be argued.

There are two scenarios to consider:

1. AI using public domain or Open Source licensed code;

There are so many variations of the same code patterns that AI
was trained, that it sounds unlikely that the produced output would
contain a substantial amount of the original code.

2. Public AI used to developt closed source 

If someone from VendorA trains a public AI to develop an IP protected driver
for HardwareA with a very specialized unique code, and someone asks the
 same AI to:

	"write a driver for HardwareA"

and get about the same code, then this would be a possible legal issue. 

Yet, on such case, the developer from VendorA, by using a public AI,
and allowed it to be trained with the code, opened the code to be used
elsewhere, eventually violating NDA. For instance, if he used
Chatgpt, this license term applies:

	"3. License to OpenAI

	 When you use the service, you grant OpenAI a license to use
	 your input for the purpose of providing and improving the 
	 service—this may include model training unless you’ve opted out.

	 This license is non-exclusive, worldwide, royalty-free, 
	 sublicensable—but it's only used as outlined in the Terms of Use
	 and privacy policies."

So, if he didn't opt-out, it granted ChatGPT and its users a patent-free
sublicensable code.

Ok, other LLM tools may have different terms, but if we end having
to many people trying to monetize from it, the usage terms will be
modified to prevent AI holders to face legal issues.

Still, while I'm not a lawyer, my understanding from the (2)
is that if one uses it for closed source development and allowed
implicitly or explicitly the inputs to be used for training, the one
that will be be accounted for, in cases envolving IP leaking, is the
person who submitted IP protected property to AI.

-- 
Thanks,
Mauro

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [MAINTAINERS SUMMIT] Annotating patches containing AI-assisted code
  2025-08-19 15:25             ` Paul E. McKenney
@ 2025-08-19 16:27               ` Mauro Carvalho Chehab
  2025-08-20 22:03                 ` Paul E. McKenney
  0 siblings, 1 reply; 97+ messages in thread
From: Mauro Carvalho Chehab @ 2025-08-19 16:27 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Mauro Carvalho Chehab, Luis Chamberlain, Krzysztof Kozlowski,
	Sasha Levin, Jiri Kosina, ksummit

On Tue, Aug 19, 2025 at 08:25:39AM -0700, Paul E. McKenney wrote:
> On Mon, Aug 18, 2025 at 11:23:32PM +0200, Mauro Carvalho Chehab wrote:
> > Em Mon, 11 Aug 2025 15:51:48 -0700
> > "Paul E. McKenney" <paulmck@kernel.org> escreveu:
> > 
> > > On Mon, Aug 11, 2025 at 03:11:47PM -0700, Luis Chamberlain wrote:
> > > > On Mon, Aug 11, 2025 at 02:46:11PM -0700, Paul E. McKenney wrote:  
> > > > > depending on how that AI was
> > > > > trained, those using that AI's output might have some difficulty meeting
> > > > > the requirements of the second portion of clause (a) of Developer's
> > > > > Certificate of Origin (DCO) 1.1: "I have the right to submit it under
> > > > > the open source license indicated in the file".  
> > > > 
> > > > If the argument is that cetain LLM generated code cannot be used for code under
> > > > the DCO, then:
> > > > 
> > > > a) isn't this debatable? Do we want to itemize a safe list for AI models
> > > >    which we think are safe to adopt for AI generated code?  
> > > 
> > > For my own work, I will continue to avoid use of AI-generated artifacts
> > > for open-source software projects unless and until some of the more
> > > consequential "debates" are resolved favorably.
> > > 
> > > > b) seems kind of too late  
> > > 
> > > Why?
> > > 
> > > > c) If something like the Generated-by tag is used, and we trust it, then
> > > >    if we do want to side against merging AI generated code, that's perhaps our
> > > >    only chance at blocking that type of code. Its however not bullet proof.  
> > > 
> > > Nothing is bullet proof.  ;-)
> > 
> > Let's face reality: before AI generation, more than one time I
> > received completely identical patches from different developers
> > with exactly the same content. Sometimes, even the descriptions
> > were similar. I got one or twice the same description even.
> 
> But of course.  And in at least some jurisdictions, one exception to
> copyright is when there is only one way to express a given concept.
> 
> > Granted, those are bug fixes for obvious fixes (usually one liners), but
> > the point is: there are certain software patterns that are so common 
> > that there are lots of developers around the globe whose are familiar
> > with. This is not different from a AI: if one asks it to write a DPS code 
> > in some language (C, C++, Python, you name it), I bet the code will be
> > at least 90% similar to any other code you or anyone else would write.
> > 
> > The rationale is that we're all trained directly or indirectly
> > (including AI) with the same textbook algorithms or from someone
> > that used such textbooks.
> 
> That may be true, but we should expect copyright law to continue to be
> vigorously enforced from time to time.  Yes, I believe that the Linux
> kernel community is a great group of people, but there is neverthelss
> no shortage of people who would be happy to take legal action against
> us if they thought doing so might benefit them.
> 
> > I can't see AI making it any better or worse from what we already
> > have.
> 
> My assumption is that any time I ask an AI a question, neither the
> question nor the answer is in any way private to me.

If you use a public service: no. If you run AI on ollama, for instance,
you're running AI locally on your machine, in priciple without access
to the Internet.

> In contrast, as
> far as I know, my own thoughts are private to me. 

Yes, up to the point you materialize them into something like a patch
and let others see your work. If you do it on a public ML, it is now
open to the public to know your ideas.

If one uses AI, his input data can be used to train the next version
of the model, after some time. So, it may still be closed to the
main audience for a couple of days/weeks/months (all depends on the
training policies - and on the AI vendor release windows).

So, if you don't want ever that other see your code, don't use AI,
maybe except via a local service like ollama. But, if you're using
AI to help with open source development, and you won't take too
much time to publish your work or it doesn't contain any special
recipe, it is probably ok to use a public AI service.

In the middle there are also paywalled AIs where the vendor
gives some assurances about using (or not) your data for the
model training.

> Yes, yes, give or take
> facial expression, body language, pheromones, and similar, but I do not
> believe even the best experts are going to deduce my technical innovations
> from such clues.  Naive of me, perhaps, but that is my firm belief.  ;-)
> 
> That difference is highly nontrivial, and could quite possibly make
> things far worse for us.

-- 
Thanks,
Mauro

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [MAINTAINERS SUMMIT] Annotating patches containing AI-assisted code
  2025-08-19 16:16               ` Mauro Carvalho Chehab
@ 2025-08-20 21:44                 ` Paul E. McKenney
  2025-08-21 10:23                   ` Mauro Carvalho Chehab
  0 siblings, 1 reply; 97+ messages in thread
From: Paul E. McKenney @ 2025-08-20 21:44 UTC (permalink / raw)
  To: Mauro Carvalho Chehab; +Cc: James Bottomley, Jiri Kosina, ksummit

On Tue, Aug 19, 2025 at 06:16:10PM +0200, Mauro Carvalho Chehab wrote:
> On Tue, Aug 19, 2025 at 04:23:46PM +0100, James Bottomley wrote:
> > On August 18, 2025 10:07:29 PM GMT+01:00, Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:
> > >Em Tue, 12 Aug 2025 07:42:21 -0700
> > >"Paul E. McKenney" <paulmck@kernel.org> escreveu:
> > [...]
> > > do agree that many of the lawsuits seem to be motivated by an
> > >> overwhelmening desire to monetize the output of AI that was induced by
> > >> someone else's prompts, if that is what you are getting at.  It does seem
> > >> to me personally that after you have sliced and diced the training data,
> > >> fair use should apply, but last I checked, fair use was a USA-only thing.
> > >
> > >Maybe, but other Countries have similar concepts. I remember I saw an
> > >interpretation of the Brazilian copyright law once from a famous layer
> > >at property rights matter, stating that reproducing small parts of a book, 
> > >for instance, could be ok, under certain circumstances (in a concept
> > >similar to US fair use).
> > 
> > Yes, technically.  Article 10 of the Berne convention contains a weaker concept allowing quotations without encumbrance based on a three prong test that the quote isn't extensive,  doesn't rob the rights holder of substantial royalties and doesn't unreasonably prejudice the existing copyright rights.
> 
> Exactly. The interpretation from such speech I mentioned is based on that.
> Now, exactly what is substantial is something that could be argued.
> 
> There are two scenarios to consider:
> 
> 1. AI using public domain or Open Source licensed code;
> 
> There are so many variations of the same code patterns that AI
> was trained, that it sounds unlikely that the produced output would
> contain a substantial amount of the original code.
> 
> 2. Public AI used to developt closed source 
> 
> If someone from VendorA trains a public AI to develop an IP protected driver
> for HardwareA with a very specialized unique code, and someone asks the
>  same AI to:
> 
> 	"write a driver for HardwareA"
> 
> and get about the same code, then this would be a possible legal issue. 
> 
> Yet, on such case, the developer from VendorA, by using a public AI,
> and allowed it to be trained with the code, opened the code to be used
> elsewhere, eventually violating NDA. For instance, if he used
> Chatgpt, this license term applies:
> 
> 	"3. License to OpenAI
> 
> 	 When you use the service, you grant OpenAI a license to use
> 	 your input for the purpose of providing and improving the 
> 	 service—this may include model training unless you’ve opted out.
> 
> 	 This license is non-exclusive, worldwide, royalty-free, 
> 	 sublicensable—but it's only used as outlined in the Terms of Use
> 	 and privacy policies."
> 
> So, if he didn't opt-out, it granted ChatGPT and its users a patent-free
> sublicensable code.
> 
> Ok, other LLM tools may have different terms, but if we end having
> to many people trying to monetize from it, the usage terms will be
> modified to prevent AI holders to face legal issues.
> 
> Still, while I'm not a lawyer, my understanding from the (2)
> is that if one uses it for closed source development and allowed
> implicitly or explicitly the inputs to be used for training, the one
> that will be be accounted for, in cases envolving IP leaking, is the
> person who submitted IP protected property to AI.

Many of the AI players scrape the web, and might well pull in training
data from web pages having a restrictive copyright.  The AI's output
might then be influenced by that restricted training data.  Although we
might desperately want this not to be a problem for AI-based submissions
to the Linux kernel, what we want and what the various legal systems
actually give us are not guaranteed to have much relation to each other.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [MAINTAINERS SUMMIT] Annotating patches containing AI-assisted code
  2025-08-18 21:41             ` Mauro Carvalho Chehab
@ 2025-08-20 21:48               ` Paul E. McKenney
  0 siblings, 0 replies; 97+ messages in thread
From: Paul E. McKenney @ 2025-08-20 21:48 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Luis Chamberlain, Krzysztof Kozlowski, Sasha Levin, Jiri Kosina, ksummit

On Mon, Aug 18, 2025 at 11:41:29PM +0200, Mauro Carvalho Chehab wrote:
> Em Mon, 11 Aug 2025 16:22:21 -0700
> Luis Chamberlain <mcgrof@kernel.org> escreveu:
> 
> > On Mon, Aug 11, 2025 at 03:51:48PM -0700, Paul E. McKenney wrote:
> > > On Mon, Aug 11, 2025 at 03:11:47PM -0700, Luis Chamberlain wrote:  
> > > > b) seems kind of too late  
> > > 
> > > Why?  
> > 
> > One cannot assume at this point AI generated code has not been merged
> > into any large scale open source project.
> > 
> > I am also not sure it can be stopped.
> 
> Agreed. The same applies to all other patches: nobody can really tell if
> some code could potentially contain code not developed by the submitter.
> 
> To be frank, considering that most companies nowadays have policies of
> not using public AI for private code, I suspect that AI generated code
> contains only public domain code or open source. As open source licenses
> explicitly allow one to learn from the written code, except if AI
> (and the developer using it) are just copying the code, it will very
> likely be at the already allowed open source license clauses.
> 
> Now, when someone from a company submits a patch for the company
> hardware, for instance, it is a lot harder for a maintainer to be
> sure that such submission was approved. The SoB is a sort of
> protection for us, as the submitter declared that he had the
> permissions.
> 
> So, at least from my side, provided that the patch is good(*), I'm not
> concerned if it used AI to help him or not.
> 
> (*) good patch means that, even if AI was used, a human adjusted
>     it to ensure its quality.

I am with you in theory on relying on the SoB, but in practice we have
not yet clearly stated what the SoB rules are with respect to AI output.
This is largely because the legal rules that we will need to align with
are still in the process of being established, and rather messily via
a large storm of legal actions.

Again, although I hope that it eventually proves to be safe to use AI
output in Linux-kernel patches, it would be foolish to count on this
at present.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [MAINTAINERS SUMMIT] Annotating patches containing AI-assisted code
  2025-08-19 16:27               ` Mauro Carvalho Chehab
@ 2025-08-20 22:03                 ` Paul E. McKenney
  2025-08-21 10:54                   ` Miguel Ojeda
  0 siblings, 1 reply; 97+ messages in thread
From: Paul E. McKenney @ 2025-08-20 22:03 UTC (permalink / raw)
  To: Mauro Carvalho Chehab; +Cc: Jiri Kosina, ksummit

On Tue, Aug 19, 2025 at 06:27:20PM +0200, Mauro Carvalho Chehab wrote:
> On Tue, Aug 19, 2025 at 08:25:39AM -0700, Paul E. McKenney wrote:
> > On Mon, Aug 18, 2025 at 11:23:32PM +0200, Mauro Carvalho Chehab wrote:
> > > Em Mon, 11 Aug 2025 15:51:48 -0700
> > > "Paul E. McKenney" <paulmck@kernel.org> escreveu:
> > > 
> > > > On Mon, Aug 11, 2025 at 03:11:47PM -0700, Luis Chamberlain wrote:
> > > > > On Mon, Aug 11, 2025 at 02:46:11PM -0700, Paul E. McKenney wrote:  
> > > > > > depending on how that AI was
> > > > > > trained, those using that AI's output might have some difficulty meeting
> > > > > > the requirements of the second portion of clause (a) of Developer's
> > > > > > Certificate of Origin (DCO) 1.1: "I have the right to submit it under
> > > > > > the open source license indicated in the file".  
> > > > > 
> > > > > If the argument is that cetain LLM generated code cannot be used for code under
> > > > > the DCO, then:
> > > > > 
> > > > > a) isn't this debatable? Do we want to itemize a safe list for AI models
> > > > >    which we think are safe to adopt for AI generated code?  
> > > > 
> > > > For my own work, I will continue to avoid use of AI-generated artifacts
> > > > for open-source software projects unless and until some of the more
> > > > consequential "debates" are resolved favorably.
> > > > 
> > > > > b) seems kind of too late  
> > > > 
> > > > Why?
> > > > 
> > > > > c) If something like the Generated-by tag is used, and we trust it, then
> > > > >    if we do want to side against merging AI generated code, that's perhaps our
> > > > >    only chance at blocking that type of code. Its however not bullet proof.  
> > > > 
> > > > Nothing is bullet proof.  ;-)
> > > 
> > > Let's face reality: before AI generation, more than one time I
> > > received completely identical patches from different developers
> > > with exactly the same content. Sometimes, even the descriptions
> > > were similar. I got one or twice the same description even.
> > 
> > But of course.  And in at least some jurisdictions, one exception to
> > copyright is when there is only one way to express a given concept.
> > 
> > > Granted, those are bug fixes for obvious fixes (usually one liners), but
> > > the point is: there are certain software patterns that are so common 
> > > that there are lots of developers around the globe whose are familiar
> > > with. This is not different from a AI: if one asks it to write a DPS code 
> > > in some language (C, C++, Python, you name it), I bet the code will be
> > > at least 90% similar to any other code you or anyone else would write.
> > > 
> > > The rationale is that we're all trained directly or indirectly
> > > (including AI) with the same textbook algorithms or from someone
> > > that used such textbooks.
> > 
> > That may be true, but we should expect copyright law to continue to be
> > vigorously enforced from time to time.  Yes, I believe that the Linux
> > kernel community is a great group of people, but there is neverthelss
> > no shortage of people who would be happy to take legal action against
> > us if they thought doing so might benefit them.
> > 
> > > I can't see AI making it any better or worse from what we already
> > > have.
> > 
> > My assumption is that any time I ask an AI a question, neither the
> > question nor the answer is in any way private to me.
> 
> If you use a public service: no. If you run AI on ollama, for instance,
> you're running AI locally on your machine, in priciple without access
> to the Internet.
> 
> > In contrast, as
> > far as I know, my own thoughts are private to me. 
> 
> Yes, up to the point you materialize them into something like a patch
> and let others see your work. If you do it on a public ML, it is now
> open to the public to know your ideas.

It is far worse than that.  If I post a patch that I generated with my
own wetware, all people see is the patch itself, along with any public
design documentation that I might have produced along the way.

If I use a public ML, much more data is available, perhaps to bad actors,
on what training data went into producing that patch.  Absent some remote
mind-reading technology, that kind of data is simply not available for
wetware-generated patches.

Please understand that this is a very important difference.

> If one uses AI, his input data can be used to train the next version
> of the model, after some time. So, it may still be closed to the
> main audience for a couple of days/weeks/months (all depends on the
> training policies - and on the AI vendor release windows).
> 
> So, if you don't want ever that other see your code, don't use AI,
> maybe except via a local service like ollama. But, if you're using
> AI to help with open source development, and you won't take too
> much time to publish your work or it doesn't contain any special
> recipe, it is probably ok to use a public AI service.

Again, I am not anywhere near as worried about use of some AI-generated
patch after publication as I am about use of the connection of that
patch to the training data that helped to generate it.

Use of a local service might seem attractive, but even if you somehow
know for sure that it doesn't send your prompts off somewhere, it very
likely at least logs them, for customer-service purposes if nothing else.
Which might be less obviously troubling that broadcasting the prompts
publicly, but any logged prompts are still discoverable in the legal
sense.

Please understand that you are communicating with someone who once had
lawyers come in an photocopy all the paper in his cube and copy out all
the mass storage of all of his devices.  This is not at all theoretical.

> In the middle there are also paywalled AIs where the vendor
> gives some assurances about using (or not) your data for the
> model training.

Assurrances are nice, but ransomware and other attack vectors can
render assurrances meaningless, all of the vendor's good intentions
notwithstanding.

							Thanx, Paul

> > Yes, yes, give or take
> > facial expression, body language, pheromones, and similar, but I do not
> > believe even the best experts are going to deduce my technical innovations
> > from such clues.  Naive of me, perhaps, but that is my firm belief.  ;-)
> > 
> > That difference is highly nontrivial, and could quite possibly make
> > things far worse for us.
> 
> -- 
> Thanks,
> Mauro
> 

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [MAINTAINERS SUMMIT] Annotating patches containing AI-assisted code
  2025-08-20 21:44                 ` Paul E. McKenney
@ 2025-08-21 10:23                   ` Mauro Carvalho Chehab
  2025-08-21 16:50                     ` Steven Rostedt
                                       ` (2 more replies)
  0 siblings, 3 replies; 97+ messages in thread
From: Mauro Carvalho Chehab @ 2025-08-21 10:23 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: James Bottomley, Jiri Kosina, ksummit

Em Wed, 20 Aug 2025 14:44:00 -0700
"Paul E. McKenney" <paulmck@kernel.org> escreveu:

> On Tue, Aug 19, 2025 at 06:16:10PM +0200, Mauro Carvalho Chehab wrote:
> > On Tue, Aug 19, 2025 at 04:23:46PM +0100, James Bottomley wrote:  
> > > On August 18, 2025 10:07:29 PM GMT+01:00, Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:  
> > > >Em Tue, 12 Aug 2025 07:42:21 -0700
> > > >"Paul E. McKenney" <paulmck@kernel.org> escreveu:  
> > > [...]  
> > > > do agree that many of the lawsuits seem to be motivated by an  
> > > >> overwhelmening desire to monetize the output of AI that was induced by
> > > >> someone else's prompts, if that is what you are getting at.  It does seem
> > > >> to me personally that after you have sliced and diced the training data,
> > > >> fair use should apply, but last I checked, fair use was a USA-only thing.  
> > > >
> > > >Maybe, but other Countries have similar concepts. I remember I saw an
> > > >interpretation of the Brazilian copyright law once from a famous layer
> > > >at property rights matter, stating that reproducing small parts of a book, 
> > > >for instance, could be ok, under certain circumstances (in a concept
> > > >similar to US fair use).  
> > > 
> > > Yes, technically.  Article 10 of the Berne convention contains a weaker concept allowing quotations without encumbrance based on a three prong test that the quote isn't extensive,  doesn't rob the rights holder of substantial royalties and doesn't unreasonably prejudice the existing copyright rights.  
> > 
> > Exactly. The interpretation from such speech I mentioned is based on that.
> > Now, exactly what is substantial is something that could be argued.
> > 
> > There are two scenarios to consider:
> > 
> > 1. AI using public domain or Open Source licensed code;
> > 
> > There are so many variations of the same code patterns that AI
> > was trained, that it sounds unlikely that the produced output would
> > contain a substantial amount of the original code.
> > 
> > 2. Public AI used to developt closed source 
> > 
> > If someone from VendorA trains a public AI to develop an IP protected driver
> > for HardwareA with a very specialized unique code, and someone asks the
> >  same AI to:
> > 
> > 	"write a driver for HardwareA"
> > 
> > and get about the same code, then this would be a possible legal issue. 
> > 
> > Yet, on such case, the developer from VendorA, by using a public AI,
> > and allowed it to be trained with the code, opened the code to be used
> > elsewhere, eventually violating NDA. For instance, if he used
> > Chatgpt, this license term applies:
> > 
> > 	"3. License to OpenAI
> > 
> > 	 When you use the service, you grant OpenAI a license to use
> > 	 your input for the purpose of providing and improving the 
> > 	 service—this may include model training unless you’ve opted out.
> > 
> > 	 This license is non-exclusive, worldwide, royalty-free, 
> > 	 sublicensable—but it's only used as outlined in the Terms of Use
> > 	 and privacy policies."
> > 
> > So, if he didn't opt-out, it granted ChatGPT and its users a patent-free
> > sublicensable code.
> > 
> > Ok, other LLM tools may have different terms, but if we end having
> > to many people trying to monetize from it, the usage terms will be
> > modified to prevent AI holders to face legal issues.
> > 
> > Still, while I'm not a lawyer, my understanding from the (2)
> > is that if one uses it for closed source development and allowed
> > implicitly or explicitly the inputs to be used for training, the one
> > that will be be accounted for, in cases envolving IP leaking, is the
> > person who submitted IP protected property to AI.  
> 
> Many of the AI players scrape the web, and might well pull in training
> data from web pages having a restrictive copyright.  The AI's output
> might then be influenced by that restricted training data. 

True, but this is not different than a developer seeking the web for
answers of his development problems, reading textbooks and/or reading 
articles.

Also, if someone publicly document something an any sort of media,
it is expected that people will read, adquire knowledge from it and
eventually materialize the acquired knowledge into something. This
is fair use, and has some provision from Berne convention, although
it may depend on each Country's specific laws.

On my view, if the trained data comes from lots of different
places, as AI is actually a stochastic process that write
code by predicting the next code words, if there's just one web 
site with an specific pattern, the chances of getting exactly
the same code are pretty low. It is a way more likely that humans
would pick exactly the same code as written on his favorite
textbook than an LLM feed with hundreds of thousands of web
sites.

> Although we
> might desperately want this not to be a problem for AI-based submissions
> to the Linux kernel, what we want and what the various legal systems
> actually give us are not guaranteed to have much relation to each other.

True, but that's not the point. AI is not that different than
someone googling the net to seek for answers.

The only difference is that, when AI is used, you won't know
exactly from where the code was based.

I agree that this could be problematic. But then, again, when a maintainer 
picks a patch from someone else, the same applies: we don't have any
guaranties that the code was not just copied-and-pasted from some place,
except by the SoB.

In any case (either AI, human or hybrid AI/human), if the code has issues,
we may need to revert it.

On other words, AI doesn't radically changes it: at the end, all remains
the same.

That's why I don't think we'll get any new information nor need to
follow any procedure different than what we already do, if the developer
had used AI, and to what extent.

-

Now, a completely different thing is if we start having "incompetent"
developers ("incompetent" in the sense given by the Dilbert Principle) that
have some AI bot patch-generator to write patches they can't do themselves.

I'll certainly reject such patches and place such individuals on my
reject list.


Thanks,
Mauro

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [MAINTAINERS SUMMIT] Annotating patches containing AI-assisted code
  2025-08-20 22:03                 ` Paul E. McKenney
@ 2025-08-21 10:54                   ` Miguel Ojeda
  2025-08-21 11:46                     ` Mauro Carvalho Chehab
  0 siblings, 1 reply; 97+ messages in thread
From: Miguel Ojeda @ 2025-08-21 10:54 UTC (permalink / raw)
  To: paulmck; +Cc: Mauro Carvalho Chehab, Jiri Kosina, ksummit

On Thu, Aug 21, 2025 at 12:03 AM Paul E. McKenney <paulmck@kernel.org> wrote:
>
> Use of a local service might seem attractive, but even if you somehow
> know for sure that it doesn't send your prompts off somewhere, it very
> likely at least logs them, for customer-service purposes if nothing else.
> Which might be less obviously troubling that broadcasting the prompts
> publicly, but any logged prompts are still discoverable in the legal
> sense.

I think by "local service" Mauro may mean, in general, open source
projects that do not require network access and that would not have
customer service in the commercial sense and so on. Some open source
projects still have logging or telemetry, of course -- I don't know
how common that is in libraries/apps of that domain -- but if so I
guess forks would appear, or people would run them in isolated VMs if
they are concerned about things like that, etc.

Cheers,
Miguel

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [MAINTAINERS SUMMIT] Annotating patches containing AI-assisted code
  2025-08-21 10:54                   ` Miguel Ojeda
@ 2025-08-21 11:46                     ` Mauro Carvalho Chehab
  0 siblings, 0 replies; 97+ messages in thread
From: Mauro Carvalho Chehab @ 2025-08-21 11:46 UTC (permalink / raw)
  To: Miguel Ojeda; +Cc: paulmck, Jiri Kosina, ksummit

Em Thu, 21 Aug 2025 12:54:42 +0200
Miguel Ojeda <miguel.ojeda.sandonis@gmail.com> escreveu:

> On Thu, Aug 21, 2025 at 12:03 AM Paul E. McKenney <paulmck@kernel.org> wrote:
> >
> > Use of a local service might seem attractive, but even if you somehow
> > know for sure that it doesn't send your prompts off somewhere, it very
> > likely at least logs them, for customer-service purposes if nothing else.
> > Which might be less obviously troubling that broadcasting the prompts
> > publicly, but any logged prompts are still discoverable in the legal
> > sense.  
> 
> I think by "local service" Mauro may mean, in general, open source
> projects that do not require network access and that would not have
> customer service in the commercial sense and so on. Some open source
> projects still have logging or telemetry, of course -- I don't know
> how common that is in libraries/apps of that domain -- but if so I
> guess forks would appear, or people would run them in isolated VMs if
> they are concerned about things like that, etc.

As far as I know, running ollama locally won't send any telemetry,
and can run even without Internet, but I'm not an expert and never
looked on its source code[1].

[1] https://github.com/ollama/ollama

Thanks,
Mauro

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [MAINTAINERS SUMMIT] Annotating patches containing AI-assisted code
  2025-08-21 10:23                   ` Mauro Carvalho Chehab
@ 2025-08-21 16:50                     ` Steven Rostedt
  2025-08-21 17:30                       ` Mauro Carvalho Chehab
  2025-08-21 20:38                     ` Jiri Kosina
  2025-08-21 20:46                     ` Paul E. McKenney
  2 siblings, 1 reply; 97+ messages in thread
From: Steven Rostedt @ 2025-08-21 16:50 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Paul E. McKenney, James Bottomley, Jiri Kosina, ksummit

On Thu, 21 Aug 2025 12:23:29 +0200
Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:

> > Many of the AI players scrape the web, and might well pull in training
> > data from web pages having a restrictive copyright.  The AI's output
> > might then be influenced by that restricted training data.   
> 
> True, but this is not different than a developer seeking the web for
> answers of his development problems, reading textbooks and/or reading 
> articles.

The difference I believe is that AI is still a computer program. It could,
in theory, copy something exactly as is, where copyright does matter.

If you read something and was able to rewrite it verbatim, you would be
subject to copyright infringement if what you read had limits on how you
could reproduce it.

> 
> Also, if someone publicly document something an any sort of media,
> it is expected that people will read, adquire knowledge from it and
> eventually materialize the acquired knowledge into something. This
> is fair use, and has some provision from Berne convention, although
> it may depend on each Country's specific laws.

You can learn from it, but it also comes down to how much you actually copy
from it.

> 
> On my view, if the trained data comes from lots of different
> places, as AI is actually a stochastic process that write
> code by predicting the next code words, if there's just one web 
> site with an specific pattern, the chances of getting exactly
> the same code are pretty low. It is a way more likely that humans
> would pick exactly the same code as written on his favorite
> textbook than an LLM feed with hundreds of thousands of web
> sites.

The issue I have with the above statement is, how would you know if the AI
copied something verbatim or not? Are you going to ask it? "Hey, AI, was
this code a direct copy of anything?" Would you trust its answer?

For a human to do the same, they would have to knowingly have done the copy.

-- Steve

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [MAINTAINERS SUMMIT] Annotating patches containing AI-assisted code
  2025-08-21 16:50                     ` Steven Rostedt
@ 2025-08-21 17:30                       ` Mauro Carvalho Chehab
  2025-08-21 17:36                         ` Luck, Tony
                                           ` (2 more replies)
  0 siblings, 3 replies; 97+ messages in thread
From: Mauro Carvalho Chehab @ 2025-08-21 17:30 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: Paul E. McKenney, James Bottomley, Jiri Kosina, ksummit

Em Thu, 21 Aug 2025 12:50:37 -0400
Steven Rostedt <rostedt@goodmis.org> escreveu:

> On Thu, 21 Aug 2025 12:23:29 +0200
> Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:
> 
> > > Many of the AI players scrape the web, and might well pull in training
> > > data from web pages having a restrictive copyright.  The AI's output
> > > might then be influenced by that restricted training data.     
> > 
> > True, but this is not different than a developer seeking the web for
> > answers of his development problems, reading textbooks and/or reading 
> > articles.  
> 
> The difference I believe is that AI is still a computer program. It could,
> in theory, copy something exactly as is, where copyright does matter.
> 
> If you read something and was able to rewrite it verbatim, you would be
> subject to copyright infringement if what you read had limits on how you
> could reproduce it.

Maybe at the early days of LLM this could be true, but now that they're
massively trained by bots, the number of places it retrieves data for
its training is very large, and considering how artificial neurons
work, they will only store patterns with a high number of repetitions. 

Now, if one asks it to do a web search, then the result can be 
biased, just like if you google it at the web.

> > Also, if someone publicly document something an any sort of media,
> > it is expected that people will read, adquire knowledge from it and
> > eventually materialize the acquired knowledge into something. This
> > is fair use, and has some provision from Berne convention, although
> > it may depend on each Country's specific laws.  
> 
> You can learn from it, but it also comes down to how much you actually copy
> from it.
> 
> > 
> > On my view, if the trained data comes from lots of different
> > places, as AI is actually a stochastic process that write
> > code by predicting the next code words, if there's just one web 
> > site with an specific pattern, the chances of getting exactly
> > the same code are pretty low. It is a way more likely that humans
> > would pick exactly the same code as written on his favorite
> > textbook than an LLM feed with hundreds of thousands of web
> > sites.  
> 
> The issue I have with the above statement is, how would you know if the AI
> copied something verbatim or not? Are you going to ask it? "Hey, AI, was
> this code a direct copy of anything?" Would you trust its answer?
> 
> For a human to do the same, they would have to knowingly have done the copy.

Heh, if I ask you to write a C code to write something...

...
...
...
... 

I bet that one of the first things (if not the first) you
considered was: printf("Hello world!"). 

I also bet you can't remember the first time you saw it.

Ok, this is a very small code, but still there are some patterns
that we learn over time and we keep repeating on our code without
knowing from where they came from, nor remembering if there was
a copyright from where we picked it or not.

In my case, I probably saw my first "Hello world" either on a book
or on some magazine a lot of time ago that was copyrighted by its
authors, but I can't tell you for sure when I first saw it.

Do you remember the first time you saw that, and what copyrights
were there? :-)

Thanks,
Mauro

^ permalink raw reply	[flat|nested] 97+ messages in thread

* RE: [MAINTAINERS SUMMIT] Annotating patches containing AI-assisted code
  2025-08-21 17:30                       ` Mauro Carvalho Chehab
@ 2025-08-21 17:36                         ` Luck, Tony
  2025-08-21 18:01                           ` Mauro Carvalho Chehab
  2025-08-21 17:53                         ` Steven Rostedt
  2025-08-22  7:55                         ` Geert Uytterhoeven
  2 siblings, 1 reply; 97+ messages in thread
From: Luck, Tony @ 2025-08-21 17:36 UTC (permalink / raw)
  To: Mauro Carvalho Chehab, Steven Rostedt
  Cc: Paul E. McKenney, James Bottomley, Jiri Kosina, ksummit

> Do you remember the first time you saw that, and what copyrights
> were there? :-)

Kernighan and Ritche "The C programming language" - First edition.

-Tony

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [MAINTAINERS SUMMIT] Annotating patches containing AI-assisted code
  2025-08-21 17:30                       ` Mauro Carvalho Chehab
  2025-08-21 17:36                         ` Luck, Tony
@ 2025-08-21 17:53                         ` Steven Rostedt
  2025-08-21 18:32                           ` Mauro Carvalho Chehab
  2025-08-22  7:55                         ` Geert Uytterhoeven
  2 siblings, 1 reply; 97+ messages in thread
From: Steven Rostedt @ 2025-08-21 17:53 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Paul E. McKenney, James Bottomley, Jiri Kosina, ksummit

On Thu, 21 Aug 2025 19:30:41 +0200
Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:

> I bet that one of the first things (if not the first) you
> considered was: printf("Hello world!"). 

I do believe "how much" you copy is important in the infringement of a
copyright. I believe this is how "sampling" works in songs. If you only
copy a little, it's not considered infringement. At least that's the way I
believe it works.

Now "Hello world!" may not be enough code to copyright.

> 
> I also bet you can't remember the first time you saw it.

As Tony Luck replied, I remember when I first saw it in the K&R book back
in college.

The funny part is, I still have my book. And looking at it, even though it
recommends to use the examples, I can't seem to find where it gives you the
right to use them. The start of the book has:

  All rights reserved. No part of this publication may be reproduced,
  stored in a retrieval system, or transmitted, in any form or by any mean,
  electronic, mechanical, photocopying, recording, or otherwise, without
  the prior written permission of the publisher.

Hmm.

-- Steve

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [MAINTAINERS SUMMIT] Annotating patches containing AI-assisted code
  2025-08-21 17:36                         ` Luck, Tony
@ 2025-08-21 18:01                           ` Mauro Carvalho Chehab
  2025-08-21 19:03                             ` Steven Rostedt
  2025-08-21 21:21                             ` Paul E. McKenney
  0 siblings, 2 replies; 97+ messages in thread
From: Mauro Carvalho Chehab @ 2025-08-21 18:01 UTC (permalink / raw)
  To: Luck, Tony
  Cc: Steven Rostedt, Paul E. McKenney, James Bottomley, Jiri Kosina, ksummit

Em Thu, 21 Aug 2025 17:36:54 +0000
"Luck, Tony" <tony.luck@intel.com> escreveu:

> > Do you remember the first time you saw that, and what copyrights
> > were there? :-)  
> 
> Kernighan and Ritche "The C programming language" - First edition.

I saw it there too, but I probably saw it before that, on an "80 Micro"
Magazine edition which I don't recall anymore.

Btw, Wikipedia says it came from a BCPL code (*). So, K&R were not the
original authors.

(*) https://en.wikipedia.org/wiki/%22Hello,_World!%22_program

Anyway, the point is: i we weren't trained with such pattern, 
a printf() with "Hey" or "Hi" would be a more likely answer.

That's said, in the early programming days, I used a lot more
this pattern (**):

	The quick brown fox jumps over the lazy dog

(**) https://en.wikipedia.org/wiki/The_quick_brown_fox_jumps_over_the_lazy_dog

with has all 23 English characters. I have absolutely no glue when
I first saw it, but it was before I got "C Programming Language" 
book in hands, as I used it for a code I developed in Assembler
before learning C.

Yet, as I saw a lot more the "Hello world", I haven't using the
brown fox pattern for years.

Anyway, the point is: AI repeat patterns, but it will very likely
repeat the ones that are used on tons of different places, where
it is really hard to have any copyrights applied (as they become
common sense). Humans do the same.

Thanks,
Mauro

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [MAINTAINERS SUMMIT] Annotating patches containing AI-assisted code
  2025-08-21 17:53                         ` Steven Rostedt
@ 2025-08-21 18:32                           ` Mauro Carvalho Chehab
  2025-08-21 19:07                             ` Steven Rostedt
  0 siblings, 1 reply; 97+ messages in thread
From: Mauro Carvalho Chehab @ 2025-08-21 18:32 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: Paul E. McKenney, James Bottomley, Jiri Kosina, ksummit

Em Thu, 21 Aug 2025 13:53:29 -0400
Steven Rostedt <rostedt@goodmis.org> escreveu:

> On Thu, 21 Aug 2025 19:30:41 +0200
> Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:
> 
> > I bet that one of the first things (if not the first) you
> > considered was: printf("Hello world!").   
> 
> I do believe "how much" you copy is important in the infringement of a
> copyright. I believe this is how "sampling" works in songs. If you only
> copy a little, it's not considered infringement. At least that's the way I
> believe it works.

True, and that's the point: Berne convention with their derivative
Country-specific copyright laws have the "fair use" concept (or
similar) allowing people to copy a little. 

Yet, in the case of:

	"The quick brown fox jumps over the lazy dog"

this is far than obvious. Fortunatelly, as this was written
first time (it seems) on February 9, 1885, copyrights already
expired.

> Now "Hello world!" may not be enough code to copyright.
> 
> > 
> > I also bet you can't remember the first time you saw it.  
> 
> As Tony Luck replied, I remember when I first saw it in the K&R book back
> in college.
> 
> The funny part is, I still have my book. And looking at it, even though it
> recommends to use the examples, I can't seem to find where it gives you the
> right to use them. The start of the book has:
> 
>   All rights reserved. No part of this publication may be reproduced,
>   stored in a retrieval system, or transmitted, in any form or by any mean,
>   electronic, mechanical, photocopying, recording, or otherwise, without
>   the prior written permission of the publisher.
> 
> Hmm.

Heh, and I bet you never ever considered using it as a copyright
infringement (it is not, due to fair use), but you're probably
repeating, even without knowing it, other patterns you saw there
and on other places.

Btw, even in the case of a bigger pattern you saw there and you
may be repeating, you won't be the only one doing it: an entire
generation that used K&C textbook are also repeating them. Plus
the ones that used newer books whose authors got inspired from
it.

In practice, even with the original book's copyright, I doubt
anyone could actually enforce copyrights if one picks one of the
book's code and use as-is (and more likely one would adjust
coding style, parameter pass logic, etc).

Thanks,
Mauro

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [MAINTAINERS SUMMIT] Annotating patches containing AI-assisted code
  2025-08-21 18:01                           ` Mauro Carvalho Chehab
@ 2025-08-21 19:03                             ` Steven Rostedt
  2025-08-21 19:45                               ` Mauro Carvalho Chehab
  2025-08-21 21:21                             ` Paul E. McKenney
  1 sibling, 1 reply; 97+ messages in thread
From: Steven Rostedt @ 2025-08-21 19:03 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Luck, Tony, Paul E. McKenney, James Bottomley, Jiri Kosina, ksummit

On Thu, 21 Aug 2025 20:01:59 +0200
Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:

> (**) https://en.wikipedia.org/wiki/The_quick_brown_fox_jumps_over_the_lazy_dog
> 
> with has all 23 English characters.

               26  ;-)

-- Steve

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [MAINTAINERS SUMMIT] Annotating patches containing AI-assisted code
  2025-08-21 18:32                           ` Mauro Carvalho Chehab
@ 2025-08-21 19:07                             ` Steven Rostedt
  2025-08-21 19:52                               ` Mauro Carvalho Chehab
  0 siblings, 1 reply; 97+ messages in thread
From: Steven Rostedt @ 2025-08-21 19:07 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Paul E. McKenney, James Bottomley, Jiri Kosina, ksummit

On Thu, 21 Aug 2025 20:32:59 +0200
Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:

> Btw, even in the case of a bigger pattern you saw there and you
> may be repeating, you won't be the only one doing it: an entire
> generation that used K&C textbook are also repeating them. Plus
> the ones that used newer books whose authors got inspired from
> it.

Since the authors actively encouraged people to use their examples, there's
no incentive to go after anyone.

> 
> In practice, even with the original book's copyright, I doubt
> anyone could actually enforce copyrights if one picks one of the
> book's code and use as-is (and more likely one would adjust
> coding style, parameter pass logic, etc).

I'm not so sure. But since most people who write coding books want people
to use their work, there's been no precedence on someone going after
someone for using code from a book (that I know of).

But there's a lot of assumptions in this thread, and I fear that those that
take a too lenient approach to AI may get burned by it.

-- Steve

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [MAINTAINERS SUMMIT] Annotating patches containing AI-assisted code
  2025-08-21 19:03                             ` Steven Rostedt
@ 2025-08-21 19:45                               ` Mauro Carvalho Chehab
  0 siblings, 0 replies; 97+ messages in thread
From: Mauro Carvalho Chehab @ 2025-08-21 19:45 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Luck, Tony, Paul E. McKenney, James Bottomley, Jiri Kosina, ksummit

Em Thu, 21 Aug 2025 15:03:24 -0400
Steven Rostedt <rostedt@goodmis.org> escreveu:

> On Thu, 21 Aug 2025 20:01:59 +0200
> Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:
> 
> > (**) https://en.wikipedia.org/wiki/The_quick_brown_fox_jumps_over_the_lazy_dog
> > 
> > with has all 23 English characters.  
> 
>                26  ;-)

:-)

Yeah, 23 is on Portuguese (accents are not accounted).

Thanks,
Mauro

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [MAINTAINERS SUMMIT] Annotating patches containing AI-assisted code
  2025-08-21 19:07                             ` Steven Rostedt
@ 2025-08-21 19:52                               ` Mauro Carvalho Chehab
  2025-08-21 21:23                                 ` Paul E. McKenney
  0 siblings, 1 reply; 97+ messages in thread
From: Mauro Carvalho Chehab @ 2025-08-21 19:52 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: Paul E. McKenney, James Bottomley, Jiri Kosina, ksummit

Em Thu, 21 Aug 2025 15:07:57 -0400
Steven Rostedt <rostedt@goodmis.org> escreveu:

> On Thu, 21 Aug 2025 20:32:59 +0200
> Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:
> 
> > Btw, even in the case of a bigger pattern you saw there and you
> > may be repeating, you won't be the only one doing it: an entire
> > generation that used K&C textbook are also repeating them. Plus
> > the ones that used newer books whose authors got inspired from
> > it.  
> 
> Since the authors actively encouraged people to use their examples, there's
> no incentive to go after anyone.

True.

> > In practice, even with the original book's copyright, I doubt
> > anyone could actually enforce copyrights if one picks one of the
> > book's code and use as-is (and more likely one would adjust
> > coding style, parameter pass logic, etc).  
> 
> I'm not so sure. But since most people who write coding books want people
> to use their work, there's been no precedence on someone going after
> someone for using code from a book (that I know of).
> 
> But there's a lot of assumptions in this thread, and I fear that those that
> take a too lenient approach to AI may get burned by it.

AI is new, so yeah, there's always a risk. But then again, there's a
risk already without it. I don't think the risk is too much different.
Perhaps it is even lower, as all major companies are investing in AI,
and they don't want to be sued. Plus, they're much more interested on
the direct revenue AI can produce for them. So, probably there aren't
much intent to try costly legal actions with low chances to monetize
by going  after people using AI.

Thanks,
Mauro

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [MAINTAINERS SUMMIT] Annotating patches containing AI-assisted code
  2025-08-21 10:23                   ` Mauro Carvalho Chehab
  2025-08-21 16:50                     ` Steven Rostedt
@ 2025-08-21 20:38                     ` Jiri Kosina
  2025-08-21 21:18                       ` Jiri Kosina
  2025-08-21 20:46                     ` Paul E. McKenney
  2 siblings, 1 reply; 97+ messages in thread
From: Jiri Kosina @ 2025-08-21 20:38 UTC (permalink / raw)
  To: Mauro Carvalho Chehab; +Cc: Paul E. McKenney, James Bottomley, ksummit

On Thu, 21 Aug 2025, Mauro Carvalho Chehab wrote:

> In any case (either AI, human or hybrid AI/human), if the code has issues,
> we may need to revert it.
> 
> On other words, AI doesn't radically changes it: at the end, all remains
> the same.

The code is rarely 1:1 copy-pasted, both by humans and AI.

Transformations are needed, you need to glue individual pieces together, 
adapt for a different version of API, yada yada yada.

When done by human, there is some hope that the human does understand what 
he/she is doing in the process, and you can reach out to them for 
human-to-human discussion about the code.

With AI-generated code, there might be no such human to talk to who 
understands what the code does and why.

And one of the points why I originally brought this up is that I believe 
we need either (a) be able to take the informed decision/risk by applying 
a patch we know has been written by AI, or (b) be able to outright reject 
it on that basis (e.g. if it's too complicated).

-- 
Jiri Kosina
SUSE Labs


^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [MAINTAINERS SUMMIT] Annotating patches containing AI-assisted code
  2025-08-21 10:23                   ` Mauro Carvalho Chehab
  2025-08-21 16:50                     ` Steven Rostedt
  2025-08-21 20:38                     ` Jiri Kosina
@ 2025-08-21 20:46                     ` Paul E. McKenney
  2 siblings, 0 replies; 97+ messages in thread
From: Paul E. McKenney @ 2025-08-21 20:46 UTC (permalink / raw)
  To: Mauro Carvalho Chehab; +Cc: James Bottomley, Jiri Kosina, ksummit

On Thu, Aug 21, 2025 at 12:23:29PM +0200, Mauro Carvalho Chehab wrote:
> Em Wed, 20 Aug 2025 14:44:00 -0700
> "Paul E. McKenney" <paulmck@kernel.org> escreveu:
> 
> > On Tue, Aug 19, 2025 at 06:16:10PM +0200, Mauro Carvalho Chehab wrote:
> > > On Tue, Aug 19, 2025 at 04:23:46PM +0100, James Bottomley wrote:  
> > > > On August 18, 2025 10:07:29 PM GMT+01:00, Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:  
> > > > >Em Tue, 12 Aug 2025 07:42:21 -0700
> > > > >"Paul E. McKenney" <paulmck@kernel.org> escreveu:  
> > > > [...]  
> > > > > do agree that many of the lawsuits seem to be motivated by an  
> > > > >> overwhelmening desire to monetize the output of AI that was induced by
> > > > >> someone else's prompts, if that is what you are getting at.  It does seem
> > > > >> to me personally that after you have sliced and diced the training data,
> > > > >> fair use should apply, but last I checked, fair use was a USA-only thing.  
> > > > >
> > > > >Maybe, but other Countries have similar concepts. I remember I saw an
> > > > >interpretation of the Brazilian copyright law once from a famous layer
> > > > >at property rights matter, stating that reproducing small parts of a book, 
> > > > >for instance, could be ok, under certain circumstances (in a concept
> > > > >similar to US fair use).  
> > > > 
> > > > Yes, technically.  Article 10 of the Berne convention contains a weaker concept allowing quotations without encumbrance based on a three prong test that the quote isn't extensive,  doesn't rob the rights holder of substantial royalties and doesn't unreasonably prejudice the existing copyright rights.  
> > > 
> > > Exactly. The interpretation from such speech I mentioned is based on that.
> > > Now, exactly what is substantial is something that could be argued.
> > > 
> > > There are two scenarios to consider:
> > > 
> > > 1. AI using public domain or Open Source licensed code;
> > > 
> > > There are so many variations of the same code patterns that AI
> > > was trained, that it sounds unlikely that the produced output would
> > > contain a substantial amount of the original code.
> > > 
> > > 2. Public AI used to developt closed source 
> > > 
> > > If someone from VendorA trains a public AI to develop an IP protected driver
> > > for HardwareA with a very specialized unique code, and someone asks the
> > >  same AI to:
> > > 
> > > 	"write a driver for HardwareA"
> > > 
> > > and get about the same code, then this would be a possible legal issue. 
> > > 
> > > Yet, on such case, the developer from VendorA, by using a public AI,
> > > and allowed it to be trained with the code, opened the code to be used
> > > elsewhere, eventually violating NDA. For instance, if he used
> > > Chatgpt, this license term applies:
> > > 
> > > 	"3. License to OpenAI
> > > 
> > > 	 When you use the service, you grant OpenAI a license to use
> > > 	 your input for the purpose of providing and improving the 
> > > 	 service—this may include model training unless you’ve opted out.
> > > 
> > > 	 This license is non-exclusive, worldwide, royalty-free, 
> > > 	 sublicensable—but it's only used as outlined in the Terms of Use
> > > 	 and privacy policies."
> > > 
> > > So, if he didn't opt-out, it granted ChatGPT and its users a patent-free
> > > sublicensable code.
> > > 
> > > Ok, other LLM tools may have different terms, but if we end having
> > > to many people trying to monetize from it, the usage terms will be
> > > modified to prevent AI holders to face legal issues.
> > > 
> > > Still, while I'm not a lawyer, my understanding from the (2)
> > > is that if one uses it for closed source development and allowed
> > > implicitly or explicitly the inputs to be used for training, the one
> > > that will be be accounted for, in cases envolving IP leaking, is the
> > > person who submitted IP protected property to AI.  
> > 
> > Many of the AI players scrape the web, and might well pull in training
> > data from web pages having a restrictive copyright.  The AI's output
> > might then be influenced by that restricted training data. 
> 
> True, but this is not different than a developer seeking the web for
> answers of his development problems, reading textbooks and/or reading 
> articles.
> 
> Also, if someone publicly document something an any sort of media,
> it is expected that people will read, adquire knowledge from it and
> eventually materialize the acquired knowledge into something. This
> is fair use, and has some provision from Berne convention, although
> it may depend on each Country's specific laws.
> 
> On my view, if the trained data comes from lots of different
> places, as AI is actually a stochastic process that write
> code by predicting the next code words, if there's just one web 
> site with an specific pattern, the chances of getting exactly
> the same code are pretty low. It is a way more likely that humans
> would pick exactly the same code as written on his favorite
> textbook than an LLM feed with hundreds of thousands of web
> sites.

As I said in reply to a similar argument from James in this thread, I
do sympathize with this view and I do hope that it prevails.  However,
it is just as much wishful thinking for us as is the countering view that
goes something like "I want one euro for each time someone generates text
from an AI that might have been trained on my writings, and I deserve
that euro, and you all are going to pay me."

There are already lawsuits in flight that appear to be driven
by this philosophy, repugnant though that might be to all of us.
We simply do know know how the various courts will decide this issue.
And unfortunately, it would be completely foolish to assume that we have
a PR advantage over those seeking per-AI-output euros.

We should not risk the Linux kernel based on wishful thinking, and should
therefore exclude AI-generated code from it for the time being.

Or do you have an publicly available authoritative statement, perhaps
from the attorneys from the Linux Foundation, giving a competent legal
opinion that it is OK to accept AI-generated code into the Linux kernel?

> > Although we
> > might desperately want this not to be a problem for AI-based submissions
> > to the Linux kernel, what we want and what the various legal systems
> > actually give us are not guaranteed to have much relation to each other.
> 
> True, but that's not the point. AI is not that different than
> someone googling the net to seek for answers.

You and I can say that, but any given court of law might or might not
agree.

> The only difference is that, when AI is used, you won't know
> exactly from where the code was based.

And that is exactly one of the reasons why use of AI to generate Linux
kernel code is of greater risk than perusing references, whether on the
web or printed on dead trees.  One of the problems is that someone else
can likely work out *exactly* what that output code was based on.

> I agree that this could be problematic. But then, again, when a maintainer 
> picks a patch from someone else, the same applies: we don't have any
> guaranties that the code was not just copied-and-pasted from some place,
> except by the SoB.

Agreed, and I propose that we use SoB for the AI-generated case as well.

After all, the DCO explicitly states "I have the right to submit it
under the open source license indicated in the file".  In the case of
AI-generated code, just like for code from any other source, do you
know *for* *sure* that you have that right?  If not, you had better not
submit it.  Very simple.

> In any case (either AI, human or hybrid AI/human), if the code has issues,
> we may need to revert it.

This I agree with.

> On other words, AI doesn't radically changes it: at the end, all remains
> the same.

I agree with this, but only to the extent that the existing DCO says not
to submit code of unknown pedigree, whether from AI or from anything else.

> That's why I don't think we'll get any new information nor need to
> follow any procedure different than what we already do, if the developer
> had used AI, and to what extent.

I agree that our current procedures cover this case.  As a result, I am
shocked and dismayed that quite a few people seem to believe that it is
somehow OK to submit generic AI-generated code to the Linux kernel.

> Now, a completely different thing is if we start having "incompetent"
> developers ("incompetent" in the sense given by the Dilbert Principle) that
> have some AI bot patch-generator to write patches they can't do themselves.
> 
> I'll certainly reject such patches and place such individuals on my
> reject list.

I agree that we will always need to reserve the right to reject bad
patches, regardless of how they were created.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [MAINTAINERS SUMMIT] Annotating patches containing AI-assisted code
  2025-08-21 20:38                     ` Jiri Kosina
@ 2025-08-21 21:18                       ` Jiri Kosina
  0 siblings, 0 replies; 97+ messages in thread
From: Jiri Kosina @ 2025-08-21 21:18 UTC (permalink / raw)
  To: Mauro Carvalho Chehab; +Cc: Paul E. McKenney, James Bottomley, ksummit

On Thu, 21 Aug 2025, Jiri Kosina wrote:

> And one of the points why I originally brought this up is that I believe 
> we need either (a) be able to take the informed decision/risk by applying 
> a patch we know has been written by AI, or (b) be able to outright reject 
> it on that basis (e.g. if it's too complicated).

... and again, that's leaving all the legal aspects (which need to be 
figured out as well, of course) aside.

-- 
Jiri Kosina
SUSE Labs


^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [MAINTAINERS SUMMIT] Annotating patches containing AI-assisted code
  2025-08-21 18:01                           ` Mauro Carvalho Chehab
  2025-08-21 19:03                             ` Steven Rostedt
@ 2025-08-21 21:21                             ` Paul E. McKenney
  2025-08-21 21:32                               ` Steven Rostedt
  1 sibling, 1 reply; 97+ messages in thread
From: Paul E. McKenney @ 2025-08-21 21:21 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Luck, Tony, Steven Rostedt, James Bottomley, Jiri Kosina, ksummit

On Thu, Aug 21, 2025 at 08:01:59PM +0200, Mauro Carvalho Chehab wrote:
> Em Thu, 21 Aug 2025 17:36:54 +0000
> "Luck, Tony" <tony.luck@intel.com> escreveu:
> 
> > > Do you remember the first time you saw that, and what copyrights
> > > were there? :-)  
> > 
> > Kernighan and Ritche "The C programming language" - First edition.
> 
> I saw it there too, but I probably saw it before that, on an "80 Micro"
> Magazine edition which I don't recall anymore.
> 
> Btw, Wikipedia says it came from a BCPL code (*). So, K&R were not the
> original authors.
> 
> (*) https://en.wikipedia.org/wiki/%22Hello,_World!%22_program
> 
> Anyway, the point is: i we weren't trained with such pattern, 
> a printf() with "Hey" or "Hi" would be a more likely answer.

But having engaged in some risky behavior in the past does not obligate
us to engate in risky behavior in the future.

In addition, in happy contrast to AI-generated output, I am not aware of
any in-flight lawsuits involving "Hello, World!".  So I don't find your
example at all applicable to the current situation with AI-generated
output.

> That's said, in the early programming days, I used a lot more
> this pattern (**):
> 
> 	The quick brown fox jumps over the lazy dog
> 
> (**) https://en.wikipedia.org/wiki/The_quick_brown_fox_jumps_over_the_lazy_dog
> 
> with has all 23 English characters. I have absolutely no glue when
> I first saw it, but it was before I got "C Programming Language" 
> book in hands, as I used it for a code I developed in Assembler
> before learning C.
> 
> Yet, as I saw a lot more the "Hello world", I haven't using the
> brown fox pattern for years.

"Sphynx of black quartz: Judge my vow!"

> Anyway, the point is: AI repeat patterns, but it will very likely
> repeat the ones that are used on tons of different places, where
> it is really hard to have any copyrights applied (as they become
> common sense). Humans do the same.

That might well be.  But AI output has been also known to include obscure
text.  And average behavior is not always helpful in legal matters.
For example, I would strongly advise against attempting to get out of a
speeding ticket by arguing that your average speed over the past month
was below the posted speed limit.  I suspect that you would find that
the court would instead look at your instantaneous speed at the time
in question.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [MAINTAINERS SUMMIT] Annotating patches containing AI-assisted code
  2025-08-21 19:52                               ` Mauro Carvalho Chehab
@ 2025-08-21 21:23                                 ` Paul E. McKenney
  0 siblings, 0 replies; 97+ messages in thread
From: Paul E. McKenney @ 2025-08-21 21:23 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Steven Rostedt, James Bottomley, Jiri Kosina, ksummit

On Thu, Aug 21, 2025 at 09:52:29PM +0200, Mauro Carvalho Chehab wrote:
> Em Thu, 21 Aug 2025 15:07:57 -0400
> Steven Rostedt <rostedt@goodmis.org> escreveu:
> 
> > On Thu, 21 Aug 2025 20:32:59 +0200
> > Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:
> > 
> > > Btw, even in the case of a bigger pattern you saw there and you
> > > may be repeating, you won't be the only one doing it: an entire
> > > generation that used K&C textbook are also repeating them. Plus
> > > the ones that used newer books whose authors got inspired from
> > > it.  
> > 
> > Since the authors actively encouraged people to use their examples, there's
> > no incentive to go after anyone.
> 
> True.
> 
> > > In practice, even with the original book's copyright, I doubt
> > > anyone could actually enforce copyrights if one picks one of the
> > > book's code and use as-is (and more likely one would adjust
> > > coding style, parameter pass logic, etc).  
> > 
> > I'm not so sure. But since most people who write coding books want people
> > to use their work, there's been no precedence on someone going after
> > someone for using code from a book (that I know of).
> > 
> > But there's a lot of assumptions in this thread, and I fear that those that
> > take a too lenient approach to AI may get burned by it.
> 
> AI is new, so yeah, there's always a risk. But then again, there's a
> risk already without it. I don't think the risk is too much different.
> Perhaps it is even lower, as all major companies are investing in AI,
> and they don't want to be sued. Plus, they're much more interested on
> the direct revenue AI can produce for them. So, probably there aren't
> much intent to try costly legal actions with low chances to monetize
> by going  after people using AI.

Let's please be realistic.  There isn't just a tiny bit of legal risk.
Instead, there are multiple lawsuits currently in flight on the status
of AI-generated material.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [MAINTAINERS SUMMIT] Annotating patches containing AI-assisted code
  2025-08-21 21:21                             ` Paul E. McKenney
@ 2025-08-21 21:32                               ` Steven Rostedt
  2025-08-21 21:49                                 ` Paul E. McKenney
  0 siblings, 1 reply; 97+ messages in thread
From: Steven Rostedt @ 2025-08-21 21:32 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Mauro Carvalho Chehab, Luck, Tony, James Bottomley, Jiri Kosina, ksummit

On Thu, 21 Aug 2025 14:21:13 -0700
"Paul E. McKenney" <paulmck@kernel.org> wrote:

> > Yet, as I saw a lot more the "Hello world", I haven't using the
> > brown fox pattern for years.  
> 
> "Sphynx of black quartz: Judge my vow!"

I had to ask Gemini: "Make up a sentence with every English letter in it",
and it gave me:

   Since "the quick brown fox jumps over the lazy dog" and "pack my box with
   five dozen liquor jugs" are already well-known, here are a couple of
   different sentences that contain every letter of the alphabet:

   "Jinxed wizards pluck ivy from the big quivering sphinx."

   "Sympathizing would vex a quick jab from the crazy fox."

:-p

-- Steve

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [MAINTAINERS SUMMIT] Annotating patches containing AI-assisted code
  2025-08-21 21:32                               ` Steven Rostedt
@ 2025-08-21 21:49                                 ` Paul E. McKenney
  0 siblings, 0 replies; 97+ messages in thread
From: Paul E. McKenney @ 2025-08-21 21:49 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Mauro Carvalho Chehab, Luck, Tony, James Bottomley, Jiri Kosina, ksummit

On Thu, Aug 21, 2025 at 05:32:07PM -0400, Steven Rostedt wrote:
> On Thu, 21 Aug 2025 14:21:13 -0700
> "Paul E. McKenney" <paulmck@kernel.org> wrote:
> 
> > > Yet, as I saw a lot more the "Hello world", I haven't using the
> > > brown fox pattern for years.  
> > 
> > "Sphynx of black quartz: Judge my vow!"
> 
> I had to ask Gemini: "Make up a sentence with every English letter in it",
> and it gave me:
> 
>    Since "the quick brown fox jumps over the lazy dog" and "pack my box with
>    five dozen liquor jugs" are already well-known, here are a couple of
>    different sentences that contain every letter of the alphabet:
> 
>    "Jinxed wizards pluck ivy from the big quivering sphinx."
> 
>    "Sympathizing would vex a quick jab from the crazy fox."

E2BIG!  ;-) ;-) ;-)

							Thanx, Paul

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [MAINTAINERS SUMMIT] Annotating patches containing AI-assisted code
  2025-08-21 17:30                       ` Mauro Carvalho Chehab
  2025-08-21 17:36                         ` Luck, Tony
  2025-08-21 17:53                         ` Steven Rostedt
@ 2025-08-22  7:55                         ` Geert Uytterhoeven
  2 siblings, 0 replies; 97+ messages in thread
From: Geert Uytterhoeven @ 2025-08-22  7:55 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Steven Rostedt, Paul E. McKenney, James Bottomley, Jiri Kosina, ksummit

Hi Mauro,

On Thu, 21 Aug 2025 at 19:32, Mauro Carvalho Chehab
<mchehab+huawei@kernel.org> wrote:
> Em Thu, 21 Aug 2025 12:50:37 -0400
> Steven Rostedt <rostedt@goodmis.org> escreveu:
> > On Thu, 21 Aug 2025 12:23:29 +0200
> > Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:
> > > > Many of the AI players scrape the web, and might well pull in training
> > > > data from web pages having a restrictive copyright.  The AI's output
> > > > might then be influenced by that restricted training data.
> > >
> > > True, but this is not different than a developer seeking the web for
> > > answers of his development problems, reading textbooks and/or reading
> > > articles.
> >
> > The difference I believe is that AI is still a computer program. It could,
> > in theory, copy something exactly as is, where copyright does matter.
> >
> > If you read something and was able to rewrite it verbatim, you would be
> > subject to copyright infringement if what you read had limits on how you
> > could reproduce it.
>
> Maybe at the early days of LLM this could be true, but now that they're
> massively trained by bots, the number of places it retrieves data for
> its training is very large, and considering how artificial neurons
> work, they will only store patterns with a high number of repetitions.

How does it know which are reputable sources, and which are garbage?

Gr{oetje,eeting}s,

                        Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [MAINTAINERS SUMMIT] Annotating patches containing AI-assisted code
  2025-08-05 15:38 [MAINTAINERS SUMMIT] Annotating patches containing AI-assisted code Jiri Kosina
  2025-08-05 17:50 ` Sasha Levin
  2025-08-06  8:17 ` Dan Carpenter
@ 2025-09-15 18:01 ` Kees Cook
  2025-09-15 18:29   ` dan.j.williams
                     ` (2 more replies)
  2 siblings, 3 replies; 97+ messages in thread
From: Kees Cook @ 2025-09-15 18:01 UTC (permalink / raw)
  To: Jiri Kosina; +Cc: ksummit

On Tue, Aug 05, 2025 at 05:38:36PM +0200, Jiri Kosina wrote:
> I believe we need to at least settle on (and document) the way how to 
> express in patch (meta)data:
> 
> - this patch has been assisted by LLM $X
> - the human understanding the generated code is $Y

A lot was covered in this thread already, but I noticed that much of the
discussion ended up looking at LLM assistance from the perspective of
"one shot" execution, either as a comparison against other "mechanical"
transformation tools (like Coccinelle) or from the perspective of bulk
code creation ("the LLM wrote it all").

What I didn't see discussed, and where I think there is substantially
greater utility to be had from LLMs, is contributions created during long
running sessions. Such sessions might include more than "just" writing
code, like doing builds, running tests, or helping with debugging (like
driving gdb analysis of kernel crashes). I think this scenario makes
things much more complex to "declare" in a commit log.

(Some examples from me: "security fix I drove the LLM to make"[1], "I
worked interactively with an LLM to construct API testing coverage"[2]
and "I used the LLM to find missed conversion instances and do cross-arch
build and test validation"[3].)

The awkward analogy I have is that of carving a fish out of drift
wood: I picked the wood, and then used a chainsaw, chisel, and knife
to remove everything that wasn't the fish I wanted. Normally only the
final result is shown. For more complex creations, I might describe why
I made various choices. If I'm asked to describe "how I used the chisel"
it quickly becomes murky: it was one of several tools used, and its use
depended on other tools and other choices and the state of the sculpture
at any given time.

So, what I mean to say is it's certainly useful to declare "I used a
chisel", but that for long running sessions it becomes kind of pointless
to include much more than a general gist of what the process was. This
immediately gets at the "trust" part of this thread making the mentioned
"human understanding the generated code" a central issue. How should that
be expressed? Our existing commit logs don't do a lot of "show your work"
right now, but rather focus on the why/what of a change, and less "how did
I write this". It's not strictly absent (some commit logs discuss what
alternatives were tried and eliminated, for example), but we've tended
to look only at final results and instead use trust in contributors as
a stand-in for "prove to me you understand what you've changed".

It seems like a "show your work" approach for commit logs would be
valuable regardless of tools involved. I've been struggling to find a
short way to describe this, though. Initially I thought we wanted to
ask "Why is this contribution correct?" but we actually already expect
that to be answered in the commit log. We want something more specific,
like "How did you construct this solution?" But that is unlikely to be
distilled into a trailer tag.

-Kees

[1] https://lore.kernel.org/lkml/20250724080756.work.741-kees@kernel.org/
[2] https://lore.kernel.org/lkml/20250717085156.work.363-kees@kernel.org/
[3] https://lore.kernel.org/lkml/20250804163910.work.929-kees@kernel.org/

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [MAINTAINERS SUMMIT] Annotating patches containing AI-assisted code
  2025-09-15 18:01 ` Kees Cook
@ 2025-09-15 18:29   ` dan.j.williams
  2025-09-16 15:36     ` James Bottomley
  2025-09-16  9:39   ` Jiri Kosina
  2025-09-16 14:20   ` Steven Rostedt
  2 siblings, 1 reply; 97+ messages in thread
From: dan.j.williams @ 2025-09-15 18:29 UTC (permalink / raw)
  To: Kees Cook, Jiri Kosina; +Cc: ksummit

Kees Cook wrote:
> On Tue, Aug 05, 2025 at 05:38:36PM +0200, Jiri Kosina wrote:
[..]
> It seems like a "show your work" approach for commit logs would be
> valuable regardless of tools involved. I've been struggling to find a
> short way to describe this, though. Initially I thought we wanted to
> ask "Why is this contribution correct?" but we actually already expect
> that to be answered in the commit log. We want something more specific,
> like "How did you construct this solution?" But that is unlikely to be
> distilled into a trailer tag.

Is this something more than "declare assumptions and tradeoffs"? One of
the trust smells of a patchset is understanding earnest alternatives,
and the author's willingness to entertain alternatives.

If a submitter is not prepared to to argue *against* the patch being
included in its current form, then that can indicate more homework is
needed.

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [MAINTAINERS SUMMIT] Annotating patches containing AI-assisted code
  2025-09-15 18:01 ` Kees Cook
  2025-09-15 18:29   ` dan.j.williams
@ 2025-09-16  9:39   ` Jiri Kosina
  2025-09-16 15:31     ` James Bottomley
  2025-09-16 14:20   ` Steven Rostedt
  2 siblings, 1 reply; 97+ messages in thread
From: Jiri Kosina @ 2025-09-16  9:39 UTC (permalink / raw)
  To: Kees Cook; +Cc: ksummit

On Mon, 15 Sep 2025, Kees Cook wrote:

> So, what I mean to say is it's certainly useful to declare "I used a
> chisel", but that for long running sessions it becomes kind of pointless
> to include much more than a general gist of what the process was. This
> immediately gets at the "trust" part of this thread making the mentioned
> "human understanding the generated code" a central issue. How should that
> be expressed? Our existing commit logs don't do a lot of "show your work"
> right now, but rather focus on the why/what of a change, and less "how did
> I write this". It's not strictly absent (some commit logs discuss what
> alternatives were tried and eliminated, for example), but we've tended
> to look only at final results and instead use trust in contributors as
> a stand-in for "prove to me you understand what you've changed".

Thanks, I understand your point.

I, however, don't think I as a maintainer care at all whether the patch 
has been "assisted by" some LLM when it comes to proposing testing 
scanarios and testcases, managing the testing results, yada yada yada.

If the patch author wishes to express that in one way or the other, just a 
freetext form in the commit log is completely fine for me.

I don't think we care about that aspect direcly neither from "maintainer 
workflow" nor legal perspectives (IANAL, of course).

But we do (or should, I believe) care about the actual submitted code 
having been produced by it, for both of the the reasons above.

Thanks,

-- 
Jiri Kosina


^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [MAINTAINERS SUMMIT] Annotating patches containing AI-assisted code
  2025-09-15 18:01 ` Kees Cook
  2025-09-15 18:29   ` dan.j.williams
  2025-09-16  9:39   ` Jiri Kosina
@ 2025-09-16 14:20   ` Steven Rostedt
  2025-09-16 15:00     ` Mauro Carvalho Chehab
  2025-09-16 23:30     ` Kees Cook
  2 siblings, 2 replies; 97+ messages in thread
From: Steven Rostedt @ 2025-09-16 14:20 UTC (permalink / raw)
  To: Kees Cook; +Cc: Jiri Kosina, ksummit

On Mon, 15 Sep 2025 11:01:46 -0700
Kees Cook <kees@kernel.org> wrote:

> So, what I mean to say is it's certainly useful to declare "I used a
> chisel", but that for long running sessions it becomes kind of pointless
> to include much more than a general gist of what the process was. This
> immediately gets at the "trust" part of this thread making the mentioned
> "human understanding the generated code" a central issue. How should that
> be expressed? Our existing commit logs don't do a lot of "show your work"
> right now, but rather focus on the why/what of a change, and less "how did
> I write this". It's not strictly absent (some commit logs discuss what
> alternatives were tried and eliminated, for example), but we've tended
> to look only at final results and instead use trust in contributors as
> a stand-in for "prove to me you understand what you've changed".

I don't think anyone cares if you used AI to help you understand the
situation or to test your work. But if you had a robot build you the fish
and you handed that in as your own work, that would be deceptive.

Saying "this patch has been assisted by LLM $X" is quite too vague and I
don't think that's necessary for most cases. It's only necessary if the AI
created code for you that is beyond the normal "completion" (like filling
out your for loop syntax). I like to use a quick sort example. If you ask
AI to "give me a quick sort routine", that should definitely be expressed
in the change log.

-- Steve

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [MAINTAINERS SUMMIT] Annotating patches containing AI-assisted code
  2025-09-16 14:20   ` Steven Rostedt
@ 2025-09-16 15:00     ` Mauro Carvalho Chehab
  2025-09-16 15:48       ` Steven Rostedt
  2025-09-16 23:30     ` Kees Cook
  1 sibling, 1 reply; 97+ messages in thread
From: Mauro Carvalho Chehab @ 2025-09-16 15:00 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: Kees Cook, Jiri Kosina, ksummit

On Tue, Sep 16, 2025 at 10:20:22AM -0400, Steven Rostedt wrote:
> On Mon, 15 Sep 2025 11:01:46 -0700
> Kees Cook <kees@kernel.org> wrote:
> 
> > So, what I mean to say is it's certainly useful to declare "I used a
> > chisel", but that for long running sessions it becomes kind of pointless
> > to include much more than a general gist of what the process was. This
> > immediately gets at the "trust" part of this thread making the mentioned
> > "human understanding the generated code" a central issue. How should that
> > be expressed? Our existing commit logs don't do a lot of "show your work"
> > right now, but rather focus on the why/what of a change, and less "how did
> > I write this". It's not strictly absent (some commit logs discuss what
> > alternatives were tried and eliminated, for example), but we've tended
> > to look only at final results and instead use trust in contributors as
> > a stand-in for "prove to me you understand what you've changed".
> 
> I don't think anyone cares if you used AI to help you understand the
> situation or to test your work. But if you had a robot build you the fish
> and you handed that in as your own work, that would be deceptive.

Agreed.

> Saying "this patch has been assisted by LLM $X" is quite too vague and I
> don't think that's necessary for most cases. It's only necessary if the AI
> created code for you that is beyond the normal "completion" (like filling
> out your for loop syntax). I like to use a quick sort example. If you ask
> AI to "give me a quick sort routine", that should definitely be expressed
> in the change log.

Agreed with the concept. Yet, asking AI to implement a quick sort routine
which is widely documented on several textbooks - or some other very common
algorithm with dozens of GPLv2 (and even public domain) code examples
is probably fine. Now, if one asks AI to implement the very latest fancy
sort algorithm from most recent published papers, then this is problematic.

Thanks,
Mauro

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [MAINTAINERS SUMMIT] Annotating patches containing AI-assisted code
  2025-09-16  9:39   ` Jiri Kosina
@ 2025-09-16 15:31     ` James Bottomley
  0 siblings, 0 replies; 97+ messages in thread
From: James Bottomley @ 2025-09-16 15:31 UTC (permalink / raw)
  To: Jiri Kosina, Kees Cook; +Cc: ksummit

On Tue, 2025-09-16 at 11:39 +0200, Jiri Kosina wrote:
> On Mon, 15 Sep 2025, Kees Cook wrote:
> 
> > So, what I mean to say is it's certainly useful to declare "I used
> > a chisel", but that for long running sessions it becomes kind of
> > pointless to include much more than a general gist of what the
> > process was. This immediately gets at the "trust" part of this
> > thread making the mentioned "human understanding the generated
> > code" a central issue. How should that be expressed? Our existing
> > commit logs don't do a lot of "show your work" right now, but
> > rather focus on the why/what of a change, and less "how did
> > I write this". It's not strictly absent (some commit logs discuss
> > what alternatives were tried and eliminated, for example), but
> > we've tended to look only at final results and instead use trust in
> > contributors as a stand-in for "prove to me you understand what
> > you've changed".
> 
> Thanks, I understand your point.
> 
> I, however, don't think I as a maintainer care at all whether the
> patch has been "assisted by" some LLM when it comes to proposing
> testing scanarios and testcases, managing the testing results, yada
> yada yada.

I think historians might care to have this record, so from that point
of view it might be useful to preserve but ...

> If the patch author wishes to express that in one way or the other,
> just a freetext form in the commit log is completely fine for me.
> 
> I don't think we care about that aspect direcly neither from
> "maintainer workflow" nor legal perspectives (IANAL, of course).

From the legal point of view we have to remember that copyright only
protects expression, not ideas.  So if AI gave you the idea, in the
same way as going to a conference or reading about it in a paper, but
you wrote the code, the copyright is all yours and nothing needs
recording in the signoff chain.  It's only if some of the actual
expression the AI gave made its way into the commit that we'd need it
documenting in the signoff chain.

> But we do (or should, I believe) care about the actual submitted code
> having been produced by it, for both of the the reasons above.

I agree with you in principle (as outlined above).  However, in
practice there is a legal grey area over what constitutes original
expression vs copying.  For example, if you're asking AI about finding
say a memory region and it advises you to use a sort and recommends
quicksort, that's not copying because it never produced any code
(expression) and only contributed ideas.  However, if as part of this
back and forth AI produced quicksort code which you implemented in a
substantially similar way in the kernel several weeks later, there
would be an argument to be made that it was copying not original
expression.  In that latter case it would be helpful later to have a
contemporaneous record that AI produced some code which the patch
author looked at but then implemented their own separate version. 
Putting this in the commit log would seem to be an ideal place to
preserve it.

Regards,

James


^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [MAINTAINERS SUMMIT] Annotating patches containing AI-assisted code
  2025-09-15 18:29   ` dan.j.williams
@ 2025-09-16 15:36     ` James Bottomley
  0 siblings, 0 replies; 97+ messages in thread
From: James Bottomley @ 2025-09-16 15:36 UTC (permalink / raw)
  To: dan.j.williams, Kees Cook, Jiri Kosina; +Cc: ksummit

On Mon, 2025-09-15 at 11:29 -0700, dan.j.williams@intel.com wrote:
> Kees Cook wrote:
> > On Tue, Aug 05, 2025 at 05:38:36PM +0200, Jiri Kosina wrote:
> [..]
> > It seems like a "show your work" approach for commit logs would be
> > valuable regardless of tools involved. I've been struggling to find
> > a short way to describe this, though. Initially I thought we wanted
> > to ask "Why is this contribution correct?" but we actually already
> > expect that to be answered in the commit log. We want something
> > more specific, like "How did you construct this solution?" But that
> > is unlikely to be distilled into a trailer tag.
> 
> Is this something more than "declare assumptions and tradeoffs"? One
> of the trust smells of a patchset is understanding earnest
> alternatives, and the author's willingness to entertain alternatives.
> 
> If a submitter is not prepared to to argue *against* the patch being
> included in its current form, then that can indicate more homework is
> needed.

I agree this is necessary for a submitter to engage in the patch
process, but I would argue it's not sufficient to satisfy concerns
about AI content of the patch because there are many instances of one
individual successfully taking over another's patch set for the
purposes of putting it upstream (and thus making the arguments above).

Regards,

James


^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [MAINTAINERS SUMMIT] Annotating patches containing AI-assisted code
  2025-09-16 15:00     ` Mauro Carvalho Chehab
@ 2025-09-16 15:48       ` Steven Rostedt
  2025-09-16 16:06         ` Luck, Tony
  0 siblings, 1 reply; 97+ messages in thread
From: Steven Rostedt @ 2025-09-16 15:48 UTC (permalink / raw)
  To: Mauro Carvalho Chehab; +Cc: Jiri Kosina, ksummit

On Tue, 16 Sep 2025 17:00:37 +0200
Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:

> Agreed with the concept. Yet, asking AI to implement a quick sort routine
> which is widely documented on several textbooks - or some other very common
> algorithm with dozens of GPLv2 (and even public domain) code examples
> is probably fine. Now, if one asks AI to implement the very latest fancy
> sort algorithm from most recent published papers, then this is problematic.

Perhaps we need a way to say "Hey, AI, give me a sort routine that is
compatible with the GPLv2 license" and then hope that it actually gives
you that! ;-)

-- Steve

^ permalink raw reply	[flat|nested] 97+ messages in thread

* RE: [MAINTAINERS SUMMIT] Annotating patches containing AI-assisted code
  2025-09-16 15:48       ` Steven Rostedt
@ 2025-09-16 16:06         ` Luck, Tony
  2025-09-16 16:58           ` H. Peter Anvin
  0 siblings, 1 reply; 97+ messages in thread
From: Luck, Tony @ 2025-09-16 16:06 UTC (permalink / raw)
  To: Steven Rostedt, Mauro Carvalho Chehab; +Cc: Jiri Kosina, ksummit

> Perhaps we need a way to say "Hey, AI, give me a sort routine that is
> compatible with the GPLv2 license" and then hope that it actually gives
> you that! ;-)

Current generation AI just gives output that looks like an answer to the
prompt. So, in this case it might just slap

// SPDX-License-Identifier: GPL-2.0-only

on the output and call itself successful.

If you want to be sure, you'd have to pay to train an AI model
specifically on just GPL code and use that rather than some
generic AI model trained on everything that could be scraped.

-Tony

^ permalink raw reply	[flat|nested] 97+ messages in thread

* RE: [MAINTAINERS SUMMIT] Annotating patches containing AI-assisted code
  2025-09-16 16:06         ` Luck, Tony
@ 2025-09-16 16:58           ` H. Peter Anvin
  0 siblings, 0 replies; 97+ messages in thread
From: H. Peter Anvin @ 2025-09-16 16:58 UTC (permalink / raw)
  To: Luck, Tony, Steven Rostedt, Mauro Carvalho Chehab; +Cc: Jiri Kosina, ksummit

On September 16, 2025 9:06:01 AM PDT, "Luck, Tony" <tony.luck@intel.com> wrote:
>> Perhaps we need a way to say "Hey, AI, give me a sort routine that is
>> compatible with the GPLv2 license" and then hope that it actually gives
>> you that! ;-)
>
>Current generation AI just gives output that looks like an answer to the
>prompt. So, in this case it might just slap
>
>// SPDX-License-Identifier: GPL-2.0-only
>
>on the output and call itself successful.
>
>If you want to be sure, you'd have to pay to train an AI model
>specifically on just GPL code and use that rather than some
>generic AI model trained on everything that could be scraped.
>
>-Tony
>
>

Now repeat that for every license subset that someone cares about. This is something that AI clearly doesn't hold a candle to humans yet: the ability to know what they don't know, and the ability to know what they *shouldn't* know in a particular context.

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [MAINTAINERS SUMMIT] Annotating patches containing AI-assisted code
  2025-09-16 14:20   ` Steven Rostedt
  2025-09-16 15:00     ` Mauro Carvalho Chehab
@ 2025-09-16 23:30     ` Kees Cook
  2025-09-17 15:16       ` Steven Rostedt
  2025-09-17 17:02       ` Laurent Pinchart
  1 sibling, 2 replies; 97+ messages in thread
From: Kees Cook @ 2025-09-16 23:30 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: Jiri Kosina, ksummit

On Tue, Sep 16, 2025 at 10:20:22AM -0400, Steven Rostedt wrote:
> On Mon, 15 Sep 2025 11:01:46 -0700
> Kees Cook <kees@kernel.org> wrote:
> 
> > So, what I mean to say is it's certainly useful to declare "I used a
> > chisel", but that for long running sessions it becomes kind of pointless
> > to include much more than a general gist of what the process was. This
> > immediately gets at the "trust" part of this thread making the mentioned
> > "human understanding the generated code" a central issue. How should that
> > be expressed? Our existing commit logs don't do a lot of "show your work"
> > right now, but rather focus on the why/what of a change, and less "how did
> > I write this". It's not strictly absent (some commit logs discuss what
> > alternatives were tried and eliminated, for example), but we've tended
> > to look only at final results and instead use trust in contributors as
> > a stand-in for "prove to me you understand what you've changed".
> 
> I don't think anyone cares if you used AI to help you understand the
> situation or to test your work. But if you had a robot build you the fish
> and you handed that in as your own work, that would be deceptive.

Right, but the LLMs aren't used strictly as a "workflow" assistant. Do
we want to say "I used the chisel to remove a big hunk of wood you can't
see at all in the fish."

Perhaps the issue is to just over-explain when the LLM is in use for
now, and we (as a developer community) will collectively figure out what
turns out to be unimportant or redundant over time. But this can't be
done with a trailer tag: we're going to need relatively verbose notes
in the commit log.

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [MAINTAINERS SUMMIT] Annotating patches containing AI-assisted code
  2025-09-16 23:30     ` Kees Cook
@ 2025-09-17 15:16       ` Steven Rostedt
  2025-09-17 17:02       ` Laurent Pinchart
  1 sibling, 0 replies; 97+ messages in thread
From: Steven Rostedt @ 2025-09-17 15:16 UTC (permalink / raw)
  To: Kees Cook; +Cc: Jiri Kosina, ksummit

On Tue, 16 Sep 2025 16:30:30 -0700
Kees Cook <kees@kernel.org> wrote:

> Perhaps the issue is to just over-explain when the LLM is in use for
> now, and we (as a developer community) will collectively figure out what
> turns out to be unimportant or redundant over time. But this can't be
> done with a trailer tag: we're going to need relatively verbose notes
> in the commit log.

Do we need that much notes in the change log? Really, I think the only time
it is an issue is if AI wrote any non-trivial code. And what I mean by
non-trivial, it is pretty much anything other than auto complete.

But if it was used in tooling that doesn't show up in the actual patch, I
don't think it needs to be mentioned unless the developer wants to share
how they came up with the code.

For example, if I run spatch on code and it finds issues, and I fix it. I
may mention "this was found via spatch". But I don't go into too much
details. If you say "X LLM discovered this",  that would be nice, but I
don't think it is mandatory.

-- Steve

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [MAINTAINERS SUMMIT] Annotating patches containing AI-assisted code
  2025-09-16 23:30     ` Kees Cook
  2025-09-17 15:16       ` Steven Rostedt
@ 2025-09-17 17:02       ` Laurent Pinchart
  1 sibling, 0 replies; 97+ messages in thread
From: Laurent Pinchart @ 2025-09-17 17:02 UTC (permalink / raw)
  To: Kees Cook; +Cc: Steven Rostedt, Jiri Kosina, ksummit

On Tue, Sep 16, 2025 at 04:30:30PM -0700, Kees Cook wrote:
> On Tue, Sep 16, 2025 at 10:20:22AM -0400, Steven Rostedt wrote:
> > On Mon, 15 Sep 2025 11:01:46 -0700 Kees Cook wrote:
> > 
> > > So, what I mean to say is it's certainly useful to declare "I used a
> > > chisel", but that for long running sessions it becomes kind of pointless
> > > to include much more than a general gist of what the process was. This
> > > immediately gets at the "trust" part of this thread making the mentioned
> > > "human understanding the generated code" a central issue. How should that
> > > be expressed? Our existing commit logs don't do a lot of "show your work"
> > > right now, but rather focus on the why/what of a change, and less "how did
> > > I write this". It's not strictly absent (some commit logs discuss what
> > > alternatives were tried and eliminated, for example), but we've tended
> > > to look only at final results and instead use trust in contributors as
> > > a stand-in for "prove to me you understand what you've changed".
> > 
> > I don't think anyone cares if you used AI to help you understand the
> > situation or to test your work. But if you had a robot build you the fish
> > and you handed that in as your own work, that would be deceptive.
> 
> Right, but the LLMs aren't used strictly as a "workflow" assistant. Do
> we want to say "I used the chisel to remove a big hunk of wood you can't
> see at all in the fish."
> 
> Perhaps the issue is to just over-explain when the LLM is in use for
> now, and we (as a developer community) will collectively figure out what
> turns out to be unimportant or redundant over time. But this can't be
> done with a trailer tag: we're going to need relatively verbose notes
> in the commit log.

This conversation has gone in many directions, it seems to be time to
try and refocus. One way to do so would be to focus on *why* we want
those annotations. That may help then deciding how to best annotate
patches to reach the intended goals.

Personally, I care about knowing the possible LLM origin of a patch for
two main reasons:

- From a maintainer point of view, to focus my reviews. I will look for
  different error patterns, or will interpret patterns differently for a
  human author compared to an LLM.

  This mostly focusses on code produced by LLMs. Usage of LLMs to
  understand an API or review a piece of code before submission is less
  of a concern.

  I don't specifically need a commit trailer for this purpose, a
  free-formed explanation in the commit message would be enough. I would
  rely on trust that submitters will be honest there, the same way we
  rely on trust that a submitter has no malicious or selfish intent in
  general. We of course stay alert to detect breaches of that trust
  today, and I wouldn't see a need to change the process there: we
  already deal with situations where a submitter is determined not to be
  trustworthy.

- From a legal point of view, to detect code that may not be compatible
  with the GPL license. This is still a legal grey area, if we want to
  accept code generated by LLMs without waiting for courts around the
  world to make clear decisions, we can't ignore that there's a legal
  risk.

  For this purpose, a commit trailer would be useful, in order to easily
  list affected commits.

-- 
Regards,

Laurent Pinchart

^ permalink raw reply	[flat|nested] 97+ messages in thread

end of thread, other threads:[~2025-09-17 17:03 UTC | newest]

Thread overview: 97+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-08-05 15:38 [MAINTAINERS SUMMIT] Annotating patches containing AI-assisted code Jiri Kosina
2025-08-05 17:50 ` Sasha Levin
2025-08-05 18:00   ` Laurent Pinchart
2025-08-05 18:16     ` Sasha Levin
2025-08-05 21:53       ` Jiri Kosina
2025-08-05 22:41       ` Laurent Pinchart
2025-08-05 18:34     ` Lorenzo Stoakes
2025-08-05 22:06     ` Alexandre Belloni
2025-08-05 18:32   ` Lorenzo Stoakes
2025-08-08  8:31   ` Krzysztof Kozlowski
2025-08-11 21:46     ` Paul E. McKenney
2025-08-11 21:57       ` Luck, Tony
2025-08-11 22:12         ` Paul E. McKenney
2025-08-11 22:45           ` H. Peter Anvin
2025-08-11 22:52             ` Paul E. McKenney
2025-08-11 22:54           ` Jonathan Corbet
2025-08-11 23:03             ` Paul E. McKenney
2025-08-12 15:47               ` Steven Rostedt
2025-08-12 16:06                 ` Paul E. McKenney
2025-08-11 22:28         ` Sasha Levin
2025-08-12 15:49           ` Steven Rostedt
2025-08-12 16:03             ` Krzysztof Kozlowski
2025-08-12 16:12               ` Paul E. McKenney
2025-08-12 16:17                 ` Krzysztof Kozlowski
2025-08-12 17:12                   ` Steven Rostedt
2025-08-12 17:39                     ` Paul E. McKenney
2025-08-11 22:11       ` Luis Chamberlain
2025-08-11 22:51         ` Paul E. McKenney
2025-08-11 23:22           ` Luis Chamberlain
2025-08-11 23:42             ` Paul E. McKenney
2025-08-12  0:02               ` Luis Chamberlain
2025-08-12  2:49                 ` Paul E. McKenney
2025-08-18 21:41             ` Mauro Carvalho Chehab
2025-08-20 21:48               ` Paul E. McKenney
2025-08-12 16:01           ` Steven Rostedt
2025-08-12 16:22             ` Paul E. McKenney
2025-08-18 21:23           ` Mauro Carvalho Chehab
2025-08-19 15:25             ` Paul E. McKenney
2025-08-19 16:27               ` Mauro Carvalho Chehab
2025-08-20 22:03                 ` Paul E. McKenney
2025-08-21 10:54                   ` Miguel Ojeda
2025-08-21 11:46                     ` Mauro Carvalho Chehab
2025-08-12  8:38       ` James Bottomley
2025-08-12 13:15         ` Bird, Tim
2025-08-12 14:31           ` Greg KH
2025-08-18 21:12           ` Mauro Carvalho Chehab
2025-08-19 15:01             ` Paul E. McKenney
2025-08-12 14:42         ` Paul E. McKenney
2025-08-12 15:55           ` Laurent Pinchart
2025-08-18 21:07           ` Mauro Carvalho Chehab
2025-08-19 15:15             ` Paul E. McKenney
2025-08-19 15:23             ` James Bottomley
2025-08-19 16:16               ` Mauro Carvalho Chehab
2025-08-20 21:44                 ` Paul E. McKenney
2025-08-21 10:23                   ` Mauro Carvalho Chehab
2025-08-21 16:50                     ` Steven Rostedt
2025-08-21 17:30                       ` Mauro Carvalho Chehab
2025-08-21 17:36                         ` Luck, Tony
2025-08-21 18:01                           ` Mauro Carvalho Chehab
2025-08-21 19:03                             ` Steven Rostedt
2025-08-21 19:45                               ` Mauro Carvalho Chehab
2025-08-21 21:21                             ` Paul E. McKenney
2025-08-21 21:32                               ` Steven Rostedt
2025-08-21 21:49                                 ` Paul E. McKenney
2025-08-21 17:53                         ` Steven Rostedt
2025-08-21 18:32                           ` Mauro Carvalho Chehab
2025-08-21 19:07                             ` Steven Rostedt
2025-08-21 19:52                               ` Mauro Carvalho Chehab
2025-08-21 21:23                                 ` Paul E. McKenney
2025-08-22  7:55                         ` Geert Uytterhoeven
2025-08-21 20:38                     ` Jiri Kosina
2025-08-21 21:18                       ` Jiri Kosina
2025-08-21 20:46                     ` Paul E. McKenney
2025-08-18 17:53         ` Rafael J. Wysocki
2025-08-18 18:32           ` James Bottomley
2025-08-19 15:14             ` Paul E. McKenney
2025-08-18 19:13         ` Mauro Carvalho Chehab
2025-08-18 19:19           ` Jiri Kosina
2025-08-18 19:44             ` Rafael J. Wysocki
2025-08-18 19:47               ` Jiri Kosina
2025-08-18 22:44                 ` Laurent Pinchart
2025-08-06  8:17 ` Dan Carpenter
2025-08-06 10:13   ` Mark Brown
2025-08-12 14:36     ` Ben Dooks
2025-09-15 18:01 ` Kees Cook
2025-09-15 18:29   ` dan.j.williams
2025-09-16 15:36     ` James Bottomley
2025-09-16  9:39   ` Jiri Kosina
2025-09-16 15:31     ` James Bottomley
2025-09-16 14:20   ` Steven Rostedt
2025-09-16 15:00     ` Mauro Carvalho Chehab
2025-09-16 15:48       ` Steven Rostedt
2025-09-16 16:06         ` Luck, Tony
2025-09-16 16:58           ` H. Peter Anvin
2025-09-16 23:30     ` Kees Cook
2025-09-17 15:16       ` Steven Rostedt
2025-09-17 17:02       ` Laurent Pinchart

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox