From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org [172.17.192.35]) by mail.linuxfoundation.org (Postfix) with ESMTPS id 37F591BB for ; Sun, 10 Jul 2016 02:10:55 +0000 (UTC) Received: from cloudserver094114.home.net.pl (cloudserver094114.home.net.pl [79.96.170.134]) by smtp1.linuxfoundation.org (Postfix) with SMTP id 0A3B2E1 for ; Sun, 10 Jul 2016 02:10:53 +0000 (UTC) From: "Rafael J. Wysocki" To: James Bottomley Date: Sun, 10 Jul 2016 04:15:34 +0200 Message-ID: <146834264.pgPOSbOmkO@vostro.rjw.lan> In-Reply-To: <1468115770.2333.15.camel@HansenPartnership.com> References: <1468115770.2333.15.camel@HansenPartnership.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Cc: Trond Myklebust , ksummit-discuss@lists.linuxfoundation.org Subject: Re: [Ksummit-discuss] [CORE TOPIC] stable workflow List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , On Sunday, July 10, 2016 10:56:10 AM James Bottomley wrote: > On Sun, 2016-07-10 at 01:43 +0000, Trond Myklebust wrote: > > > On Jul 9, 2016, at 21:34, James Bottomley < > > > James.Bottomley@HansenPartnership.com> wrote: > > >=20 > > > [duplicate ksummit-discuss@ cc removed] > > > On Sat, 2016-07-09 at 15:49 +0000, Trond Myklebust wrote: > > > > > On Jul 9, 2016, at 06:05, James Bottomley < > > > > > James.Bottomley@HansenPartnership.com> wrote: > > > > >=20 > > > > > On Fri, 2016-07-08 at 17:43 -0700, Dmitry Torokhov wrote: > > > > > > On Sat, Jul 09, 2016 at 02:37:40AM +0200, Rafael J. Wysocki= > > > > > > wrote: > > > > > > > I tend to think that all known bugs should be fixed, at > > > > > > > least=20 > > > > > > > because once they have been fixed, no one needs to rememb= er > > > > > > > about them any more. :-) > > > > > > >=20 > > > > > > > Moreover, minor fixes don't really introduce regressions > > > > > > > that > > > > > > > often > > > > > >=20 > > > > > > Famous last words :) > > > > >=20 > > > > > Actually, beyond the humour, the idea that small fixes don't=20= > > > > > introduce regressions must be our most annoying anti-pattern.= =20 > > > > > The=20 > > > > > reality is that a lot of so called fixes do introduce bugs.=20= > > > > > The=20 > > > > > way this happens is that a lot of these "obvious" fixes go > > > > > through=20 > > > > > without any deep review (because they're obvious, right?) and= > > > > > the=20 > > > > > bugs noisily turn up slightly later. The way this works is > > > > > usually=20 > > > > > that some code rearrangement is sold as a "fix" and later tur= ns > > > > > out=20 > > > > > not to be equivalent to the prior code ... sometimes in > > > > > incredibly=20 > > > > > subtle ways. I think we should all be paying much more than l= ip > > > > > service to the old adage "If it ain't broke don't fix it=E2=80= =9D. > > > >=20 > > > > The main problem with the stable kernel model right now is that= > > > > we > > > > have no set of regression tests to apply. Unless someone goes i= n > > > > and > > > > actually tests each and every stable kernel affected by that =E2= =80=9CCc: > > > > stable=E2=80=9D line, then regressions will eventually happen. > > > >=20 > > > > So do we want to have another round of =E2=80=9Chow do we regre= ssion test > > > > the > > > > kernel=E2=80=9D talks? > > >=20 > > > If I look back on our problems, they were all in device drivers, = so > > > generic regression testing wouldn't have picked them up, in fact > > > most > > > would need specific testing on the actual problem device. So, I > > > don't > > > really think testing is the issue, I think it's that we commit wa= y > > > too > > > many "obvious" patches. In SCSI we try to gate it by having a > > > mandatory Reviewed-by: tag before something gets in, but really > > > perhaps > > > we should insist on Tested-by: as well ... that way there's some > > > guarantee that the actual device being modified has been tested. > >=20 > > That guarantees that it has been tested on the head of the kernel > > tree, but it doesn=E2=80=99t really tell you much about the behavio= ur when it > > hits the stable trees. >=20 > The majority of stable regressions are actually patches with subtle > failures even in the head, so testing on the head properly would have= > eliminated them. You really sound like you had some statistics on -stable regressions ha= ndy, but is it the case? The above is my impression too, but then I'm not sure how accurate it i= s. > I grant there are some problems where the backport > itself is flawed but the head works (usually because of missing > intermediate stuff) but perhaps by insisting on a Tested-by: before > backporting, we can at least eliminate a significant fraction of > regressions. It also depends on how much time it takes for the bug to show up. For example, if you fixed a bug that's 100% reproducible, but you intro= duced another one that happens once in a blue moon in the same commit, it may= not be frequent enough to be caught before the commit goes into -stable. > > What I=E2=80=99m saying is that we really want some form of unit t= esting > > that can be run to perform a minimal validation of the patch when i= t > > hits the older tree. > >=20 > > Even device drivers have expected outputs for a given input that ca= n > > be validated through unit testing. >=20 > Without the actual hardware, this is difficult ... Right. Thanks, Rafael