From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: MIME-Version: 1.0 In-Reply-To: <20160710223941.GK26097@thunk.org> References: <20160709000631.GB8989@io.lakedaemon.net> <1468024946.2390.21.camel@HansenPartnership.com> <20160709093626.GA6247@sirena.org.uk> <20160710162203.GA9681@localhost> <20160710170117.GI26097@thunk.org> <578293C5.1090503@roeck-us.net> <20160710223941.GK26097@thunk.org> From: Olof Johansson Date: Sun, 10 Jul 2016 18:12:15 -0700 Message-ID: To: "Theodore Ts'o" Content-Type: multipart/alternative; boundary=001a113fbb5ae3a755053751d7e7 Cc: James Bottomley , ksummit-discuss@lists.linux-foundation.org, Jason Cooper Subject: Re: [Ksummit-discuss] [CORE TOPIC] stable workflow List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , --001a113fbb5ae3a755053751d7e7 Content-Type: text/plain; charset=UTF-8 On Sun, Jul 10, 2016 at 3:39 PM, Theodore Ts'o wrote: > On Sun, Jul 10, 2016 at 11:28:21AM -0700, Guenter Roeck wrote: > > > There are **eleven** stable or longterm trees listed on kernel.org. > > > > I think this is one of the problems we are having: There are way too many > > stable / longterm trees. > > Part of this is because it's too easy for someone to say, "I want to > support [34].XX as a stable kernel". Maybe it will only be for one > architecture and only used for one platform (e.g. Yacto, or some other > random distribution), but it's not immediately obvious (a) who is > going to be using the stable kernel, and (b) what sort of testing it is > actually getting. > > This is fine if stable kernels are advertised as being "best efforts > only; whatever an individual stable kernel maintainer feels like > putting into the project". Which is fine, but then it's also no > surprise if device kernel maintainers and BSP kernel maintainers > aren't aren't taking the -stable kernel series. And it also becomes > surprising if other people are expecting that stable trees are > supposed to be more stable than that, and then get indignant when > there are regressions, bug fixes that aren't backported, bug fixes > that work fine on the tip but which break after getting backported, > etc. > > To be clear, though: That's the way things are right now, and someone > who wants to change it is going to have to propose a procedure which > ends up taking less work on maintainers and individual patch > submiters, and/or volunteers to do the extra work, or realistically, > it's not going to happen. > > > I think we are having kind of a circular problem: Device/BSP kernels > > don't track stable because stable branches are considered to be not > stable > > enough, and stable branches are not tested well enough because they are > not > > picked up anyway. The only means to break that circle is to improve > > stable testing to the point where people do feel comfortable picking it > up. > > > > The key to solving that problem might be automation. There are lots of > tools > > available nowadays which could be used for that purpose (gerrit, > buildbot, ...). > > Patch submissions to stable releases could be run through an automated > test > > system and only be applied to stable release candidates after all tests > passed. > > This is widely done with vendor kernels today, and should be possible for > > stable kernels as well. Such a system could even pick up patches tagged > > with Fixes: or with Cc: stable from mainline automatically. > > Testing works fine for core kernel features and for things like file > systems. But it really doesn't work with real hardware, and Olaf > described a couple of scenarios where fixes to device drivers broke > older hardware supported by the same driver. If what we are most > worried about is "no regressions", one really extreme approach would > be for a particular stable kernel series, to have a branch which > *only* has patches for which reliable and comprehensive tests exist. > This branch would at least get all of the security fixes and other bug > fixes which are applicable to the core kernel, and but it would filter > out, at least initially, all or most device driver patches. > > We could have another branch which includes the device driver fixes, > and perhaps over time we could figure out some scheme by which if the > significant device kernel and BSP kernel users could be convinced to > contribute hardware and some test engineer resources, maybe some of > the device driver fixes could go into the "tested" stable branch as > well. > > Or maybe we just leave a clean separation between "core" and "device > driver" stable branches, since in practice the answer seems to be that > once an embedded device kernel maintainer gets things working, they > **really** don't want to touch the device drivers ever again, since if > there are any hardware or software issues, they want users buying an > upgraded device every 12-18 months anyway. :-) At least that way > maybe the users will get the core security and stability fixes.... > > Or maybe we have a different policy for x86-specific device drivers > than we do for the embedded architectures, since in practice we have > more end users testing the x86 stable kernels, where as the embedded > architectures tend to get things like OTA updates, and so it's not > surprising that those maintainers are much more paranoid about driver > changes which might brick their devices. > > (Yes, I know that some drivers are shared between x86 and ARM; and I > suspect that's one of the places where we could easily have a problem > where a bugfix that fixes things for an device on an x86 base might > accidentally cause a regression for the same device hanging off of a > different bus in a SOC configuration.... and no amount of test > automation has any *hope* of catching thoes sorts of problems.) > Just to clarify, my commentary was NOT for ARM SoC support. It was for drivers frequently used on x86 laptops. So it's not an "embedded only" problem. That being said, this was several years ago, and it's not necessarily worth focusing all that much on -- I just wanted to give an example of a case where using -stable in a product tree got pushback and why. -Olof --001a113fbb5ae3a755053751d7e7 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable


On Sun, Jul 10, 2016 at 3:39 PM, Theodore Ts'o &l= t;tytso@mit.edu><= /span> wrote:
On Sun, Ju= l 10, 2016 at 11:28:21AM -0700, Guenter Roeck wrote:
> > There are **eleven** stable or longterm trees listed on kernel.org.=
>
> I think this is one of the problems we are having: There are way too m= any
> stable / longterm trees.

Part of this is because it's too easy for someone to say, "= I want to
support [34].XX as a stable kernel".=C2=A0 Maybe it will only be for o= ne
architecture and only used for one platform (e.g. Yacto, or some other
random distribution), but it's not immediately obvious (a) who is
going to be using the stable kernel, and (b) what sort of testing it is
actually getting.

This is fine if stable kernels are advertised as being "best efforts only; whatever an individual stable kernel maintainer feels like
putting into the project".=C2=A0 Which is fine, but then it's also= no
surprise if device kernel maintainers and BSP kernel maintainers
aren't aren't taking the -stable kernel series.=C2=A0 And it also b= ecomes
surprising if other people are expecting that stable trees are
supposed to be more stable than that, and then get indignant when
there are regressions, bug fixes that aren't backported, bug fixes
that work fine on the tip but which break after getting backported,
etc.

To be clear, though: That's the way things are right now, and someone who wants to change it is going to have to propose a procedure which
ends up taking less work on maintainers and individual patch
submiters, and/or volunteers to do the extra work, or realistically,
it's not going to happen.

> I think we are having kind of a circular problem: Device/BSP kernels > don't track stable because stable branches are considered to be no= t stable
> enough, and stable branches are not tested well enough because they ar= e not
> picked up anyway. The only means to break that circle is to improve > stable testing to the point where people do feel comfortable picking i= t up.
>
> The key to solving that problem might be automation. There are lots of= tools
> available nowadays which could be used for that purpose (gerrit, build= bot, ...).
> Patch submissions to stable releases could be run through an automated= test
> system and only be applied to stable release candidates after all test= s passed.
> This is widely done with vendor kernels today, and should be possible = for
> stable kernels as well. Such a system could even pick up patches tagge= d
> with Fixes: or with Cc: stable from mainline automatically.

Testing works fine for core kernel features and for things like file=
systems.=C2=A0 But it really doesn't work with real hardware, and Olaf<= br> described a couple of scenarios where fixes to device drivers broke
older hardware supported by the same driver.=C2=A0 If what we are most
worried about is "no regressions", one really extreme approach wo= uld
be for a particular stable kernel series, to have a branch which
*only* has patches for which reliable and comprehensive tests exist.
This branch would at least get all of the security fixes and other bug
fixes which are applicable to the core kernel, and but it would filter
out, at least initially, all or most device driver patches.

We could have another branch which includes the device driver fixes,
and perhaps over time we could figure out some scheme by which if the
significant device kernel and BSP kernel users could be convinced to
contribute hardware and some test engineer resources, maybe some of
the device driver fixes could go into the "tested" stable branch = as
well.

Or maybe we just leave a clean separation between "core" and &quo= t;device
driver" stable branches, since in practice the answer seems to be that=
once an embedded device kernel maintainer gets things working, they
**really** don't want to touch the device drivers ever again, since if<= br> there are any hardware or software issues, they want users buying an
upgraded device every 12-18 months anyway.=C2=A0 :-)=C2=A0 =C2=A0 At least = that way
maybe the users will get the core security and stability fixes....

Or maybe we have a different policy for x86-specific device drivers
than we do for the embedded architectures, since in practice we have
more end users testing the x86 stable kernels, where as the embedded
architectures tend to get things like OTA updates, and so it's not
surprising that those maintainers are much more paranoid about driver
changes which might brick their devices.

(Yes, I know that some drivers are shared between x86 and ARM; and I
suspect that's one of the places where we could easily have a problem where a bugfix that fixes things for an device on an x86 base might
accidentally cause a regression for the same device hanging off of a
different bus in a SOC configuration....=C2=A0 and no amount of test
automation has any *hope* of catching thoes sorts of problems.)

Just to clarify, my commentary was NOT for ARM SoC= support. It was for drivers frequently used on x86 laptops. So it's no= t an "embedded only" problem.

That being= said, this was several years ago, and it's not necessarily worth focus= ing all that much on -- I just wanted to give an example of a case where us= ing -stable in a product tree got pushback and why.


-Olof

--001a113fbb5ae3a755053751d7e7--