From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2B2332E1720 for ; Thu, 21 Aug 2025 16:50:35 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=216.40.44.10 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755795038; cv=none; b=oL1PBLd4mR+ngr7rsbo6I7IObJNmGJajLZ6lmIMqfEZkvNAsH3ZYYKYYGCvA2Q6HJj40frqf3H7ue++Zrn1W8HfgjLvsZBayN1NUNUlLroxa9eq3wfBkPTqlGZQWCQU+LtFntZX8e/ZpHkRZKT/qkmi19rHl+pJedPxh8+oXWDg= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755795038; c=relaxed/simple; bh=jIObUoPUYArZQ+79F5dGctKTb4fQkO8+Y2iFCCw6OfQ=; h=Date:From:To:Cc:Subject:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=YglaHr973PwSlXnG+PhutNNu9kR7fi3ZE4fE677fKfH6LxjHRni58Ng7eQZuYBsC/xhBStXc4U2zarHkDiIhE09JNDsgDJmNyQ06+eLysF/SenmKi0ka5+Mvmd+G5U0XNiOcbrI6WZbpaCyZdIVBi/7uVl5BoOLQPHRNDa7/n78= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=goodmis.org; spf=pass smtp.mailfrom=goodmis.org; arc=none smtp.client-ip=216.40.44.10 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=goodmis.org Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=goodmis.org Received: from omf06.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id E52D91A04C9; Thu, 21 Aug 2025 16:50:34 +0000 (UTC) Received: from [HIDDEN] (Authenticated sender: rostedt@goodmis.org) by omf06.hostedemail.com (Postfix) with ESMTPA id 3CC2320010; Thu, 21 Aug 2025 16:50:33 +0000 (UTC) Date: Thu, 21 Aug 2025 12:50:37 -0400 From: Steven Rostedt To: Mauro Carvalho Chehab Cc: "Paul E. McKenney" , James Bottomley , Jiri Kosina , ksummit@lists.linux.dev Subject: Re: [MAINTAINERS SUMMIT] Annotating patches containing AI-assisted code Message-ID: <20250821125037.5cf5be3d@gandalf.local.home> In-Reply-To: <20250821122329.03c77178@foz.lan> References: <1npn33nq-713r-r502-p5op-q627pn5555oo@fhfr.pbz> <12ded49d-daa4-4199-927e-ce844f4cfe67@kernel.org> <9020e75d-361f-457f-9def-330d8964f431@paulmck-laptop> <20250818230729.106a8c48@foz.lan> <9383F8DB-CD38-40CC-B91D-7F98E8156C04@HansenPartnership.com> <4tacplepoih3wvejopmtkdg7ujtvwmufd5teiozk5im2jikn7a@jdbou6kwindl> <20250821122329.03c77178@foz.lan> X-Mailer: Claws Mail 3.20.0git84 (GTK+ 2.24.33; x86_64-pc-linux-gnu) Precedence: bulk X-Mailing-List: ksummit@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-Rspamd-Queue-Id: 3CC2320010 X-Stat-Signature: cxbizsmek48n3pexrcwck1ok36nuz7ik X-Rspamd-Server: rspamout02 X-Session-Marker: 726F737465647440676F6F646D69732E6F7267 X-Session-ID: U2FsdGVkX18T0VVSziXv+MG0XytasnD32J+7JPeGAIA= X-HE-Tag: 1755795033-762036 X-HE-Meta: U2FsdGVkX185pDBi6XIUFMNI0/e2X8TOmyAIdvrY0EJy0Qr5fgsW5MOZdPcGJe6SUXaDS0XowG+aAvW+h0imv8JSLQ/RcfMpT5pvovqI2evkyQuiHoP4kD86ATDQdzxs4I1Za6/XFNqc00VAXl4uuLcJUM0RjJ7h4mV410Cnw4SVv616dd31e4F1fnWLN9Xt48kY/LLHZYNeKmQmfjC2naTlrfdiXNyCBkslWiWtVHOQ72D08eVlczSM2wou1Jmrggt2r2NLiiWDE5HVSW38BLEZ1Y3iYyLzs4p6yHKFb3+s39oqGzYLXtR+tNDChBg8rMhJGV4aDnlLzwYAsTMh4xM+9zdZsA3N On Thu, 21 Aug 2025 12:23:29 +0200 Mauro Carvalho Chehab wrote: > > Many of the AI players scrape the web, and might well pull in training > > data from web pages having a restrictive copyright. The AI's output > > might then be influenced by that restricted training data. > > True, but this is not different than a developer seeking the web for > answers of his development problems, reading textbooks and/or reading > articles. The difference I believe is that AI is still a computer program. It could, in theory, copy something exactly as is, where copyright does matter. If you read something and was able to rewrite it verbatim, you would be subject to copyright infringement if what you read had limits on how you could reproduce it. > > Also, if someone publicly document something an any sort of media, > it is expected that people will read, adquire knowledge from it and > eventually materialize the acquired knowledge into something. This > is fair use, and has some provision from Berne convention, although > it may depend on each Country's specific laws. You can learn from it, but it also comes down to how much you actually copy from it. > > On my view, if the trained data comes from lots of different > places, as AI is actually a stochastic process that write > code by predicting the next code words, if there's just one web > site with an specific pattern, the chances of getting exactly > the same code are pretty low. It is a way more likely that humans > would pick exactly the same code as written on his favorite > textbook than an LLM feed with hundreds of thousands of web > sites. The issue I have with the above statement is, how would you know if the AI copied something verbatim or not? Are you going to ask it? "Hey, AI, was this code a direct copy of anything?" Would you trust its answer? For a human to do the same, they would have to knowingly have done the copy. -- Steve