From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org [172.17.192.35]) by mail.linuxfoundation.org (Postfix) with ESMTPS id 335571AE5 for ; Tue, 27 Aug 2019 18:50:31 +0000 (UTC) Received: from mail-pl1-f196.google.com (mail-pl1-f196.google.com [209.85.214.196]) by smtp1.linuxfoundation.org (Postfix) with ESMTPS id 9EEF789B for ; Tue, 27 Aug 2019 18:50:30 +0000 (UTC) Received: by mail-pl1-f196.google.com with SMTP id t14so12187566plr.11 for ; Tue, 27 Aug 2019 11:50:30 -0700 (PDT) Sender: Guenter Roeck Date: Tue, 27 Aug 2019 11:50:27 -0700 From: Guenter Roeck To: Geert Uytterhoeven Message-ID: <20190827185027.GA15384@roeck-us.net> References: <20190826230206.GC28066@mit.edu> <20190827134836.GB25038@kroah.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: Cc: Joel Fernandes , Barret Rhoden , ksummit , Greg Kroah-Hartman , Jonathan Nieder , Tomasz Figa , Han-Wen Nienhuys , Theodore Tso , David Rientjes , Dmitry Torokhov , Dmitry Vyukov Subject: Re: [Ksummit-discuss] Allowing something Change-Id (or something like it) in kernel commits List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , On Tue, Aug 27, 2019 at 07:34:45PM +0200, Geert Uytterhoeven wrote: > Hi Günter, > > On Tue, Aug 27, 2019 at 4:01 PM Guenter Roeck wrote: > > On 8/27/19 6:48 AM, Greg Kroah-Hartman wrote: > > > On Tue, Aug 27, 2019 at 06:24:36AM -0700, Dmitry Vyukov wrote: > > >> On Mon, Aug 26, 2019 at 11:06 PM Thomas Gleixner wrote: > > >>> On Mon, 26 Aug 2019, Dmitry Vyukov wrote: > > >>>> A somewhat related point re UUID/Change-ID. > > >>>> For syzbot (or any other bug tracking system) we want to associate > > >>>> bugs with fixes. It turned out there is no good identity of a change > > >>>> that we could use. Commit hash is an obvious first thing to consider, > > >>>> but (1) it changes in linux-next, (2) sometimes the change is not > > >>>> committed yet when we do the association, (3) it is different when > > >>>> backported to LTS (so not possible to say if a fix is in that stable > > >>>> tree or not). > > >>>> We decided to use commit subject, which works to some degree, but also > > >>>> has problems: (1) not necessary unique, (2) sometimes people change > > >>>> subject during backporting (e.g. prepend some prefix), (3) has all the > > >>>> same problems of email clients messing with text (e.g. I can't issue > > >>>> #syz fix command for loo long commit subjects with my email client). > > >>>> Some real UUID/Change-ID would solve all of these problems by giving > > >>>> us capability to refer to changes rather than a commit in a particular > > >>>> tree only. > > >>> > > >>> If we adopt the Link: ..../$MSG tag widely then you have a UUID. > > >> > > >> Is there a way to ensure that everybody will generate right IDs > > >> (ChangeID-Version) and then a link in canonical form will be included > > >> into commit? As far as I understand this is not possible with the > > >> current kernel tooling, as this aspect is not under control of any > > >> unified tooling. > > >> I see different maintainers use links to different archive web sites. > > >> Also sometimes Link is present for other reasons (e.g. link to bug > > >> report). > > >> The link will need to be added by every developer (rather than > > >> maintainer) so that it's available before the change is committed > > >> anywhere. > > > > > > For subsystems I maintain, I am already adding the Link: tag to > > > lore.kernel.org with the message id in it. That is automatically added > > > by my scripts. > > > > > >> Though, most of these are problems for any other change identification scheme... > > > > > > Note, we have 4000+ developers every year, it's hard enough to get them > > > all to agree on major things, let alone crazy stuff like this :) > > > > > > > Is it really that crazy ? > > > > I have to use a combination of subject analysis and patch content analysis > > using fuzzy text / string comparison, combined with an analysis of the patch > > description, to answer a simple question: Is this patch upstream, and what is > > its upstream SHA ? Having a UUID tag would make this a simple and > > I typically use "git cherry -v" to check if a patch is upstream. > Yes, this may miss a patch that was changed. But that can be a good thing. > I use that as well, as "first line of defense". But it doesn't always work, for example if a patch was applied to a downstream branch while its upstream review was still going on, or for backports which required conflict resolutions or context changes. > > straightforward operation. What is crazy is having to do all this analysis. > > What happens to the UUID when a patch is split in two parts? > If a part is applied with the same UUID, that would give the false impression > that the original full patch was applied. > Presumably one would/should drop the original UUID in such a situation. One should also not blindly trust a uuid and also perform an automated and/or manual review. The key would be to have the ability to identify a patch without having to search through all of them. In practice, I use a fuzzy match on the patch subject today, followed by another fuzzy match on the patch content. Both are expensive, even with parallel search, especially if one has to search through 100k+ patches. The subject match is also error prone - a computer has a diffferent opinion about "match" than the human brain, and a seemingly innocent subject change may result in a complete mismatch. Also, yes, there may be multiple patches with the same or almost the same subject. The ability to replace or at least augment that part of the search with a uuid match would be extremely helpful. > At least rebasing (using git rebase) your submissions against upstream will > keep the part that still hasn't been applied. > Unfortunately, a straight rebase doesn't work for a number of reasons. In a nutshell we have too many local patches to deal with, in too many areas of the kernel. Also, the deviation from upstream is in almost all cases not what we want or need if a downstream patch differs from its upstream version. We have to use a more intelligent (guided) approach. Guenter