[Ksummit-discuss] [CORE TOPIC] [nomination] Move Fast and Oops Things

ksummit.lists.linux.dev archive mirror
 help / color / mirror / Atom feed

* [Ksummit-discuss] [CORE TOPIC] [nomination] Move Fast and Oops Things
@ 2014-05-15 23:13 Dan Williams
  2014-05-16  2:56 ` NeilBrown
  0 siblings, 1 reply; 38+ messages in thread
From: Dan Williams @ 2014-05-15 23:13 UTC (permalink / raw)
  To: ksummit-discuss; +Cc: Dan J Williams

What would it take and would we even consider moving 2x faster than we
are now?  A cursory glance at Facebook's development statistics [1]
shows them handling more developers, more commits, and a higher rate
of code growth than kernel.org [2].  As mentioned in their development
process presentation, "tools" and "culture" enable such a pace of
development without the project flying apart.  Assuming the response
to the initial question is not "we're moving fast enough, thank you
very much, go away", what would help us move faster?  I submit that
there are three topics in this space that have aspects which can only
be productively discussed in a forum like kernel summit:

1/ Merge Karma: Collect patch review and velocity data for a
maintainer to answer questions like "am I pushing too much risk
upstream?", "from cycle to cycle am I maintaining a consistent
velocity?", "should I modulate the scope of the review feedback I
trust?".  I think where proposals like this have fallen over in the
past was with the thought that this data could be used as a weapon by
toxic contributors, or used to injure someone's reputation.  Instead
this would be collected consistently (individually?), for private use
and shared in a limited fashion at forums like kernel summit to have
data to drive "how are we doing as a community?" discussions.

2/ Gatekeeper: Saying "no" is how we as a kernel community mitigate
risk and it is healthy for us to say "no" early and often.  However,
the only real dimension we currently have to say "no" is "no, I won't
merge your code".  The staging-tree opened up a way to give a
qualified "no" by allowing new drivers a sandbox to get in shape for
moving into the kernel-tree-proper while still being available to end
users.  The idea with a Facebook-inspired Gatekeeper system is to have
another way to say "no" while still merging code.  Consider a facility
more fine-grained than the recently deprecated CONFIG_EXPERIMENTAL,
and add run-time modification capability.  Similar to loading a
staging driver, overriding a Gatkeeper-variable (i.e. where a
maintainer has explicitly said "no") taints the kernel.  This then
becomes a tool for those situations where there is value / need in
distributing the code, while still saying "no" to its acceptability in
its current state.

3/ LKP and Testing:  If there was a generic way for tools like LKP to
discover and run per-subsystem / driver unit tests I am fairly
confident LKP would already be sending the community test results. LKP
is the closest we have to Facebook-Perflab (automated regression
testing environment), and it's one of the best tools we have for
moving development faster without increasing risk in the code we
deliver.  Has the time come for a coordinated unit-test culture in
Linux kernel development?

This topic proposal is a self-nomination (dan.j.williams@intel.com)
for attending Kernel Summit, and I also nominate Fengguang Wu
(fengguang.wu@intel.com) to participate in any discussions that
involve LKP.

[1]: http://www.infoq.com/presentations/Facebook-Release-Process
[2]: http://www.linuxfoundation.org/publications/linux-foundation/who-writes-linux-2013

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Ksummit-discuss] [CORE TOPIC] [nomination] Move Fast and Oops Things
  2014-05-15 23:13 [Ksummit-discuss] [CORE TOPIC] [nomination] Move Fast and Oops Things Dan Williams
@ 2014-05-16  2:56 ` NeilBrown
  2014-05-16 15:04   ` Chris Mason
  2014-05-21  7:22   ` Dan Williams
  0 siblings, 2 replies; 38+ messages in thread
From: NeilBrown @ 2014-05-16  2:56 UTC (permalink / raw)
  To: Dan Williams; +Cc: Dan J Williams, ksummit-discuss

[-- Attachment #1: Type: text/plain, Size: 4045 bytes --]

On Thu, 15 May 2014 16:13:58 -0700 Dan Williams <dan.j.williams@gmail.com>
wrote:

> What would it take and would we even consider moving 2x faster than we
> are now? 

Hi Dan,
 you seem to be suggesting that there is some limit other than "competent
 engineering time" which is slowing Linux "progress" down.

 Are you really suggesting that?  What might these other limits be?

 Certainly there are limits to minimum gap between conceptualisation and
 release (at least one release cycle), but is there really a limit to the
 parallelism that can be achieved?

NeilBrown


>          A cursory glance at Facebook's development statistics [1]
> shows them handling more developers, more commits, and a higher rate
> of code growth than kernel.org [2].  As mentioned in their development
> process presentation, "tools" and "culture" enable such a pace of
> development without the project flying apart.  Assuming the response
> to the initial question is not "we're moving fast enough, thank you
> very much, go away", what would help us move faster?  I submit that
> there are three topics in this space that have aspects which can only
> be productively discussed in a forum like kernel summit:
> 
> 1/ Merge Karma: Collect patch review and velocity data for a
> maintainer to answer questions like "am I pushing too much risk
> upstream?", "from cycle to cycle am I maintaining a consistent
> velocity?", "should I modulate the scope of the review feedback I
> trust?".  I think where proposals like this have fallen over in the
> past was with the thought that this data could be used as a weapon by
> toxic contributors, or used to injure someone's reputation.  Instead
> this would be collected consistently (individually?), for private use
> and shared in a limited fashion at forums like kernel summit to have
> data to drive "how are we doing as a community?" discussions.
> 
> 2/ Gatekeeper: Saying "no" is how we as a kernel community mitigate
> risk and it is healthy for us to say "no" early and often.  However,
> the only real dimension we currently have to say "no" is "no, I won't
> merge your code".  The staging-tree opened up a way to give a
> qualified "no" by allowing new drivers a sandbox to get in shape for
> moving into the kernel-tree-proper while still being available to end
> users.  The idea with a Facebook-inspired Gatekeeper system is to have
> another way to say "no" while still merging code.  Consider a facility
> more fine-grained than the recently deprecated CONFIG_EXPERIMENTAL,
> and add run-time modification capability.  Similar to loading a
> staging driver, overriding a Gatkeeper-variable (i.e. where a
> maintainer has explicitly said "no") taints the kernel.  This then
> becomes a tool for those situations where there is value / need in
> distributing the code, while still saying "no" to its acceptability in
> its current state.
> 
> 3/ LKP and Testing:  If there was a generic way for tools like LKP to
> discover and run per-subsystem / driver unit tests I am fairly
> confident LKP would already be sending the community test results. LKP
> is the closest we have to Facebook-Perflab (automated regression
> testing environment), and it's one of the best tools we have for
> moving development faster without increasing risk in the code we
> deliver.  Has the time come for a coordinated unit-test culture in
> Linux kernel development?
> 
> This topic proposal is a self-nomination (dan.j.williams@intel.com)
> for attending Kernel Summit, and I also nominate Fengguang Wu
> (fengguang.wu@intel.com) to participate in any discussions that
> involve LKP.
> 
> [1]: http://www.infoq.com/presentations/Facebook-Release-Process
> [2]: http://www.linuxfoundation.org/publications/linux-foundation/who-writes-linux-2013
> _______________________________________________
> Ksummit-discuss mailing list
> Ksummit-discuss@lists.linuxfoundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/ksummit-discuss


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Ksummit-discuss] [CORE TOPIC] [nomination] Move Fast and Oops Things
  2014-05-16  2:56 ` NeilBrown
@ 2014-05-16 15:04   ` Chris Mason
  2014-05-16 17:09     ` Andy Grover
                       ` (2 more replies)
  2014-05-21  7:22   ` Dan Williams
  1 sibling, 3 replies; 38+ messages in thread
From: Chris Mason @ 2014-05-16 15:04 UTC (permalink / raw)
  To: NeilBrown, Dan Williams; +Cc: Dan J Williams, ksummit-discuss

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 05/15/2014 10:56 PM, NeilBrown wrote:
> On Thu, 15 May 2014 16:13:58 -0700 Dan Williams
> <dan.j.williams@gmail.com> wrote:
> 
>> What would it take and would we even consider moving 2x faster
>> than we are now?
> 
> Hi Dan, you seem to be suggesting that there is some limit other
> than "competent engineering time" which is slowing Linux "progress"
> down.
> 
> Are you really suggesting that?  What might these other limits be?
> 
> Certainly there are limits to minimum gap between conceptualisation
> and release (at least one release cycle), but is there really a
> limit to the parallelism that can be achieved?

I haven't compared the FB commit rates with the kernel, but I'll
pretend Dan's basic thesis is right and talk about which parts of the
facebook model may move faster than the kernel.

The facebook is pretty similar to the way the kernel works.  The merge
window lasts a few days and the major releases are every week, but
overall it isn't too far away.

The biggest difference is that we have a centralized tool for
reviewing the patches, and once it has been reviewed by a specific
number of people, you push it in.

The patch submission tool runs the patch through lint and various
static analysis to make sure it follows proper coding style and
doesn't include patterns of known bugs.  This cuts down on the review
work because the silly coding style mistakes are gone before it gets
to the tool.

When you put in a patch, you have to put in reviewers, and they get a
little notification that your patch needs review.  Once the reviewers
are happy, you push the patch in.

The biggest difference: there are no maintainers.  If I want to go
change the calendar tool to fix a bug, I patch it, get someone else to
sign off and push.

All of which is my way of saying the maintainers (me included) are the
biggest bottleneck.  There are a lot of reasons I think the maintainer
model fits the kernel better, but at least for btrfs I'm trying to
speed up the patch review process and use patchwork more effectively.

Facebook controls risk with new features using gatekeepers in the
code.  That way we can beta test larger changes against an expanding
group of unsuspecting people without turning it on for everyone at once.

It's also much easier to accept risk when you have complete control
over deployment.  facebook.com rolls out twice a day, which is
basically a stable tree cherry-picked from tip, and there are plenty
of chances to fix problems.

Anrdoid/iphone releases can't be controlled the same way, and so they
have longer testing periods.

- -chris
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQIcBAEBAgAGBQJTdijtAAoJEATCFrcofBuh6KsP/jnA+wDUA2KW1LTdRV8JGuLp
dgguLD8H+KN4s4ft7JmLUq+fCgCtI5Y4HUXc/BTog+HY6rae7Wnkfp5EKjkFO770
RsOFdI+MxyfqQmIrhHuV7jhO69joal89vLhW1nsAYe6C8+UrHs2YFgPkzbsqbM46
1jn4k9ot6+Msgah0ZPt2U20R2RQooMe9dIJR2LofpB2MWQMzkojJnB7CLD6Kg9PH
Hjjc4EdC/7/lccw6KndJv4N2+uDAJmdP7xIdeEvS5PxS+e6/0nY4/W+XgxDYWoVQ
rrTFkS+PJMRKqfW16qrqJNB6rjpeF2ou9Y3XsKLJYwiErhwDo1+wFisF7GpxMY2G
ruD0Bn/RpNBCtCOpM9uNGW1cK5pbAgBegUrf3G6woUC/2UcOOm3L26dvhWk4ht62
y3BwS/cFMb7q/1iLvwbEMgImdFULQRwnlkmu3dHxBaaZKL9kN0Qa+YvU4qxoCV6R
YIzkz4XjWItlmZhknmw+3WXDbFUp237X3E6sN+EJ7J2LcBcMhvYcknXLc+WgiwI5
ycoP0G8Yd9LSZ8o+rraElBHT+RgIRuNLzeP8vKaPvdVeLbuC4hhCtN0gwGvKpwjr
VZoWnSFqEqWFTpSMdfGEIWoqOERfB64WAfcvF1Y7Z/nnjxEGxwHFehCE4cdZ++ZG
8OFW7b4yI67zQyghofRQ
=zNEp
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Ksummit-discuss] [CORE TOPIC] [nomination] Move Fast and Oops Things
  2014-05-16 15:04   ` Chris Mason
@ 2014-05-16 17:09     ` Andy Grover
  2014-05-23  8:11       ` Dan Carpenter
  2014-05-16 18:31     ` Randy Dunlap
  2014-05-21  7:48     ` Dan Williams
  2 siblings, 1 reply; 38+ messages in thread
From: Andy Grover @ 2014-05-16 17:09 UTC (permalink / raw)
  To: Chris Mason, NeilBrown, Dan Williams; +Cc: Dan J Williams, ksummit-discuss

On 05/16/2014 08:04 AM, Chris Mason wrote:
> The biggest difference: there are no maintainers.  If I want to go
> change the calendar tool to fix a bug, I patch it, get someone else to
> sign off and push.
>
> All of which is my way of saying the maintainers (me included) are the
> biggest bottleneck.  There are a lot of reasons I think the maintainer
> model fits the kernel better, but at least for btrfs I'm trying to
> speed up the patch review process and use patchwork more effectively.

Dan and Chris, you talked about some technical differences and 
solutions, but here are some thoughts/questions I had just on the 
non-technical side of things for the ksummit:

* Big differences vs corporate development:
- no one can be told what to do by a common boss
- No assumption that co-contributors have a basic level of competence, 
so sign-offs may not mean much
- Co-contributors' area of development may have no- or negative value 
for maintainer (see "tinification" as an e.g.)
- Co-contributors may work for competing companies

* Forking the project, the traditional FOSS avenue for 
bad-maintainer/moving-too-slow is not realistically available for the 
kernel or per-subsystem, due to massive momentum.

* If the maintainer is unresponsive, what recourse does a submitter 
have? (Is this written down somewhere?) Is taking recourse actually 
culturally acceptable? How would the gone-around maintainer treat future 
submissions?

* At what point does it make sense to field a sub- or co-maintainer?

* Would more maintainer delegation help contributor recruitment and 
continued involvement? Versus, efficiency of highly-optimized patchflows 
by fewer maintainers.

* Do current maintainers feel they cannot delegate or relinquish 
maintainership? Maintainership-as-a-burden vs. 
maintainership-as-lead-developer vs. maintainership-as-a-career-goal.

* Are there other large-scale FOSS projects that may have development 
flows worth drawing lessons from?

Thanks -- Andy

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Ksummit-discuss] [CORE TOPIC] [nomination] Move Fast and Oops Things
  2014-05-16 17:09     ` Andy Grover
@ 2014-05-23  8:11       ` Dan Carpenter
  0 siblings, 0 replies; 38+ messages in thread
From: Dan Carpenter @ 2014-05-23  8:11 UTC (permalink / raw)
  To: Andy Grover; +Cc: ksummit-discuss

On Fri, May 16, 2014 at 10:09:51AM -0700, Andy Grover wrote:
> * If the maintainer is unresponsive, what recourse does a submitter
> have? (Is this written down somewhere?) Is taking recourse actually
> culturally acceptable? How would the gone-around maintainer treat
> future submissions?

You ping the maintainer after a month of no response.  If there is still
no response after month two then you try to send it through Andrew.

If the maintainer responds but requests changes then you have to do what
he says.  If the maintainer responds but just NAKs your patch, then
you're pretty much screwed.  If you really care, then you have to carry
those patches out of tree.

> 
> * At what point does it make sense to field a sub- or co-maintainer?

I don't maintain a git tree but the mechanical bit don't seem that hard
to me.  It's reviewing the code which is tricky.  All patches should be
going to a list (LKML doesn't count) so anyone can help review patches.

But how to get people to review each other's patches?

regards,
dan carpenter

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Ksummit-discuss] [CORE TOPIC] [nomination] Move Fast and Oops Things
  2014-05-16 15:04   ` Chris Mason
  2014-05-16 17:09     ` Andy Grover
@ 2014-05-16 18:31     ` Randy Dunlap
  2014-05-21  7:48     ` Dan Williams
  2 siblings, 0 replies; 38+ messages in thread
From: Randy Dunlap @ 2014-05-16 18:31 UTC (permalink / raw)
  To: Chris Mason, NeilBrown, Dan Williams; +Cc: Dan J Williams, ksummit-discuss

On 05/16/2014 08:04 AM, Chris Mason wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> On 05/15/2014 10:56 PM, NeilBrown wrote:
>> On Thu, 15 May 2014 16:13:58 -0700 Dan Williams
>> <dan.j.williams@gmail.com> wrote:
>>
>>> What would it take and would we even consider moving 2x faster
>>> than we are now?
>>
>> Hi Dan, you seem to be suggesting that there is some limit other
>> than "competent engineering time" which is slowing Linux "progress"
>> down.
>>
>> Are you really suggesting that?  What might these other limits be?
>>
>> Certainly there are limits to minimum gap between conceptualisation
>> and release (at least one release cycle), but is there really a
>> limit to the parallelism that can be achieved?
> 
> I haven't compared the FB commit rates with the kernel, but I'll
> pretend Dan's basic thesis is right and talk about which parts of the
> facebook model may move faster than the kernel.

Thanks for the summary.

> The facebook is pretty similar to the way the kernel works.  The merge
> window lasts a few days and the major releases are every week, but
> overall it isn't too far away.
> 
> The biggest difference is that we have a centralized tool for
> reviewing the patches, and once it has been reviewed by a specific
> number of people, you push it in.
> 
> The patch submission tool runs the patch through lint and various
> static analysis to make sure it follows proper coding style and
> doesn't include patterns of known bugs.  This cuts down on the review
> work because the silly coding style mistakes are gone before it gets
> to the tool.

Yes, this is very nice.  Reviewers should not be burdened with checking
coding style or common issues or build problems or kconfig problems.
They should just be able to review the merits and correctness of the
patch.  (Yes, that would mean that I would find something different
to do on most days. :)

> When you put in a patch, you have to put in reviewers, and they get a
> little notification that your patch needs review.  Once the reviewers
> are happy, you push the patch in.
> 
> The biggest difference: there are no maintainers.  If I want to go
> change the calendar tool to fix a bug, I patch it, get someone else to
> sign off and push.
> 
> All of which is my way of saying the maintainers (me included) are the
> biggest bottleneck.  There are a lot of reasons I think the maintainer

I have to agree (me included).

> model fits the kernel better, but at least for btrfs I'm trying to
> speed up the patch review process and use patchwork more effectively.
> 
> Facebook controls risk with new features using gatekeepers in the
> code.  That way we can beta test larger changes against an expanding
> group of unsuspecting people without turning it on for everyone at once.
> 
> It's also much easier to accept risk when you have complete control
> over deployment.  facebook.com rolls out twice a day, which is
> basically a stable tree cherry-picked from tip, and there are plenty
> of chances to fix problems.
> 
> Anrdoid/iphone releases can't be controlled the same way, and so they
> have longer testing periods.


-- 
~Randy

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Ksummit-discuss] [CORE TOPIC] [nomination] Move Fast and Oops Things
  2014-05-16 15:04   ` Chris Mason
  2014-05-16 17:09     ` Andy Grover
  2014-05-16 18:31     ` Randy Dunlap
@ 2014-05-21  7:48     ` Dan Williams
  2014-05-21  7:55       ` Greg KH
  2014-05-21  8:25       ` NeilBrown
  2 siblings, 2 replies; 38+ messages in thread
From: Dan Williams @ 2014-05-21  7:48 UTC (permalink / raw)
  To: Chris Mason; +Cc: ksummit-discuss

On Fri, May 16, 2014 at 8:04 AM, Chris Mason <clm@fb.com> wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> On 05/15/2014 10:56 PM, NeilBrown wrote:
>> On Thu, 15 May 2014 16:13:58 -0700 Dan Williams
>> <dan.j.williams@gmail.com> wrote:
>>
>>> What would it take and would we even consider moving 2x faster
>>> than we are now?
>>
>> Hi Dan, you seem to be suggesting that there is some limit other
>> than "competent engineering time" which is slowing Linux "progress"
>> down.
>>
>> Are you really suggesting that?  What might these other limits be?
>>
>> Certainly there are limits to minimum gap between conceptualisation
>> and release (at least one release cycle), but is there really a
>> limit to the parallelism that can be achieved?
>
> I haven't compared the FB commit rates with the kernel, but I'll
> pretend Dan's basic thesis is right and talk about which parts of the
> facebook model may move faster than the kernel.
>
> The facebook is pretty similar to the way the kernel works.  The merge
> window lasts a few days and the major releases are every week, but
> overall it isn't too far away.
>
> The biggest difference is that we have a centralized tool for
> reviewing the patches, and once it has been reviewed by a specific
> number of people, you push it in.
>
> The patch submission tool runs the patch through lint and various
> static analysis to make sure it follows proper coding style and
> doesn't include patterns of known bugs.  This cuts down on the review
> work because the silly coding style mistakes are gone before it gets
> to the tool.
>
> When you put in a patch, you have to put in reviewers, and they get a
> little notification that your patch needs review.  Once the reviewers
> are happy, you push the patch in.
>
> The biggest difference: there are no maintainers.  If I want to go
> change the calendar tool to fix a bug, I patch it, get someone else to
> sign off and push.
>
> All of which is my way of saying the maintainers (me included) are the
> biggest bottleneck.  There are a lot of reasons I think the maintainer
> model fits the kernel better, but at least for btrfs I'm trying to
> speed up the patch review process and use patchwork more effectively.

To be clear, I'm not arguing for a maintainer-less model.  We don't
have the tooling or operational-data to support that.  We need
maintainers to say "no".  But, what I think we can do is give
maintainers more varied ways to say it.  The goal, de-escalate the
merge event as a declaration that the code quality/architecture
conversation is over.

Release early, release often, and with care merge often.

With regards to saying "no" faster, it seems kernel code rarely comes
with tests.  However, maintainers today are already able to reduce the
latency to "no" when the 0-day-kbuild robot emits a negative test.
Why not arm that system with tests it can autodiscover?  What has held
back unit test culture in the kernel?

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Ksummit-discuss] [CORE TOPIC] [nomination] Move Fast and Oops Things
  2014-05-21  7:48     ` Dan Williams
@ 2014-05-21  7:55       ` Greg KH
  2014-05-21  9:05         ` Matt Fleming
  2014-05-21  8:25       ` NeilBrown
  1 sibling, 1 reply; 38+ messages in thread
From: Greg KH @ 2014-05-21  7:55 UTC (permalink / raw)
  To: Dan Williams; +Cc: ksummit-discuss

On Wed, May 21, 2014 at 12:48:48AM -0700, Dan Williams wrote:
> With regards to saying "no" faster, it seems kernel code rarely comes
> with tests.  However, maintainers today are already able to reduce the
> latency to "no" when the 0-day-kbuild robot emits a negative test.
> Why not arm that system with tests it can autodiscover?  What has held
> back unit test culture in the kernel?

The fact that no one has stepped up and taken maintainership of the
tests and ensure that they continue to work.

I'm working on solving that issue, by getting funding for someone to do
this and focus on tests that all developers and maintainers can use to
help ensure that nothing broke when they make a change.  Give me a few
months, hopefully there will be something I can talk about soon in this
area.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Ksummit-discuss] [CORE TOPIC] [nomination] Move Fast and Oops Things
  2014-05-21  7:55       ` Greg KH
@ 2014-05-21  9:05         ` Matt Fleming
  2014-05-21 12:52           ` Greg KH
  0 siblings, 1 reply; 38+ messages in thread
From: Matt Fleming @ 2014-05-21  9:05 UTC (permalink / raw)
  To: Greg KH; +Cc: ksummit-discuss

On Wed, 21 May, at 04:55:47PM, Greg KH wrote:
> On Wed, May 21, 2014 at 12:48:48AM -0700, Dan Williams wrote:
> > With regards to saying "no" faster, it seems kernel code rarely comes
> > with tests.  However, maintainers today are already able to reduce the
> > latency to "no" when the 0-day-kbuild robot emits a negative test.
> > Why not arm that system with tests it can autodiscover?  What has held
> > back unit test culture in the kernel?
> 
> The fact that no one has stepped up and taken maintainership of the
> tests and ensure that they continue to work.

That's not usually how unit tests work. They're supposed to be owned by
everyone, i.e. if your change breaks the test you are responsible for
fixing your change, or the test, or both. Everyone needs to ensure the
tests continue to work.

Likewise, the person implementing a new feature is the most well
equipped to write tests for it. Unfortunately that does require a
certain amount of "buy-in" from the community.

However, a maintainer role might make sense for collating test results,
reporting failures or running the tests on a large number of hardware
configurations, like how Fengguang Wu says "The 0-day infrastructure
shows your commit introduced a regression" or Stephen Rothwell says "A
merge of your tree causes these conflicts".

For anything other than trivial cases I wouldn't expect these guys to
have to fixup the breakage to ensure the tests continue working - that
kind of never ending battle would make a person's head explode.

-- 
Matt Fleming, Intel Open Source Technology Center

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Ksummit-discuss] [CORE TOPIC] [nomination] Move Fast and Oops Things
  2014-05-21  9:05         ` Matt Fleming
@ 2014-05-21 12:52           ` Greg KH
  2014-05-21 13:23             ` Matt Fleming
  0 siblings, 1 reply; 38+ messages in thread
From: Greg KH @ 2014-05-21 12:52 UTC (permalink / raw)
  To: Matt Fleming; +Cc: ksummit-discuss

On Wed, May 21, 2014 at 10:05:13AM +0100, Matt Fleming wrote:
> On Wed, 21 May, at 04:55:47PM, Greg KH wrote:
> > On Wed, May 21, 2014 at 12:48:48AM -0700, Dan Williams wrote:
> > > With regards to saying "no" faster, it seems kernel code rarely comes
> > > with tests.  However, maintainers today are already able to reduce the
> > > latency to "no" when the 0-day-kbuild robot emits a negative test.
> > > Why not arm that system with tests it can autodiscover?  What has held
> > > back unit test culture in the kernel?
> > 
> > The fact that no one has stepped up and taken maintainership of the
> > tests and ensure that they continue to work.
> 
> That's not usually how unit tests work. They're supposed to be owned by
> everyone, i.e. if your change breaks the test you are responsible for
> fixing your change, or the test, or both. Everyone needs to ensure the
> tests continue to work.

Ideal world meet the real world.

Today, the in-kernel tests are broken.  Now if that is a kernel problem,
or a test problem, no one seems to be stepping up to figure that out.
Someone needs to be on top of it to do that.  Given that no one has done
that for, well, ever, this needs to be fixed.

> Likewise, the person implementing a new feature is the most well
> equipped to write tests for it. Unfortunately that does require a
> certain amount of "buy-in" from the community.

We have that "buy-in", and have had it for a long time.  I've been
asking for this for years, and finally have the ear of people who are
able to allocate resources for it.  Which is a very nice chance that I
do not want to blow it.

> However, a maintainer role might make sense for collating test results,
> reporting failures or running the tests on a large number of hardware
> configurations, like how Fengguang Wu says "The 0-day infrastructure
> shows your commit introduced a regression" or Stephen Rothwell says "A
> merge of your tree causes these conflicts".
> 
> For anything other than trivial cases I wouldn't expect these guys to
> have to fixup the breakage to ensure the tests continue working - that
> kind of never ending battle would make a person's head explode.

Agreed, but given that no one has even tried, and that I know of someone
who has agreed to do this work, let's give it a chance.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Ksummit-discuss] [CORE TOPIC] [nomination] Move Fast and Oops Things
  2014-05-21 12:52           ` Greg KH
@ 2014-05-21 13:23             ` Matt Fleming
  0 siblings, 0 replies; 38+ messages in thread
From: Matt Fleming @ 2014-05-21 13:23 UTC (permalink / raw)
  To: Greg KH; +Cc: ksummit-discuss

On Wed, 21 May, at 09:52:18PM, Greg KH wrote:
> 
> Ideal world meet the real world.
> 
> Today, the in-kernel tests are broken.  Now if that is a kernel problem,
> or a test problem, no one seems to be stepping up to figure that out.
> Someone needs to be on top of it to do that.  Given that no one has done
> that for, well, ever, this needs to be fixed.
 
Which tests are broken? FWIW, the EFI tests in tools/testing/selftests
work fine because I regularly run them against any changes I merge.

> Agreed, but given that no one has even tried, and that I know of someone
> who has agreed to do this work, let's give it a chance.

Go for it!

-- 
Matt Fleming, Intel Open Source Technology Center

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Ksummit-discuss] [CORE TOPIC] [nomination] Move Fast and Oops Things
  2014-05-21  7:48     ` Dan Williams
  2014-05-21  7:55       ` Greg KH
@ 2014-05-21  8:25       ` NeilBrown
  2014-05-21  8:36         ` Dan Williams
  1 sibling, 1 reply; 38+ messages in thread
From: NeilBrown @ 2014-05-21  8:25 UTC (permalink / raw)
  To: Dan Williams; +Cc: ksummit-discuss

[-- Attachment #1: Type: text/plain, Size: 3620 bytes --]

On Wed, 21 May 2014 00:48:48 -0700 Dan Williams <dan.j.williams@intel.com>
wrote:

> On Fri, May 16, 2014 at 8:04 AM, Chris Mason <clm@fb.com> wrote:
> > -----BEGIN PGP SIGNED MESSAGE-----
> > Hash: SHA1
> >
> > On 05/15/2014 10:56 PM, NeilBrown wrote:
> >> On Thu, 15 May 2014 16:13:58 -0700 Dan Williams
> >> <dan.j.williams@gmail.com> wrote:
> >>
> >>> What would it take and would we even consider moving 2x faster
> >>> than we are now?
> >>
> >> Hi Dan, you seem to be suggesting that there is some limit other
> >> than "competent engineering time" which is slowing Linux "progress"
> >> down.
> >>
> >> Are you really suggesting that?  What might these other limits be?
> >>
> >> Certainly there are limits to minimum gap between conceptualisation
> >> and release (at least one release cycle), but is there really a
> >> limit to the parallelism that can be achieved?
> >
> > I haven't compared the FB commit rates with the kernel, but I'll
> > pretend Dan's basic thesis is right and talk about which parts of the
> > facebook model may move faster than the kernel.
> >
> > The facebook is pretty similar to the way the kernel works.  The merge
> > window lasts a few days and the major releases are every week, but
> > overall it isn't too far away.
> >
> > The biggest difference is that we have a centralized tool for
> > reviewing the patches, and once it has been reviewed by a specific
> > number of people, you push it in.
> >
> > The patch submission tool runs the patch through lint and various
> > static analysis to make sure it follows proper coding style and
> > doesn't include patterns of known bugs.  This cuts down on the review
> > work because the silly coding style mistakes are gone before it gets
> > to the tool.
> >
> > When you put in a patch, you have to put in reviewers, and they get a
> > little notification that your patch needs review.  Once the reviewers
> > are happy, you push the patch in.
> >
> > The biggest difference: there are no maintainers.  If I want to go
> > change the calendar tool to fix a bug, I patch it, get someone else to
> > sign off and push.
> >
> > All of which is my way of saying the maintainers (me included) are the
> > biggest bottleneck.  There are a lot of reasons I think the maintainer
> > model fits the kernel better, but at least for btrfs I'm trying to
> > speed up the patch review process and use patchwork more effectively.
> 
> To be clear, I'm not arguing for a maintainer-less model.  We don't
> have the tooling or operational-data to support that.  We need
> maintainers to say "no".  But, what I think we can do is give
> maintainers more varied ways to say it.  The goal, de-escalate the
> merge event as a declaration that the code quality/architecture
> conversation is over.
> 
> Release early, release often, and with care merge often.

I think this falls foul of the "no regressions" rule.

The kernel policy is that once the functionality gets to users, it cannot be
taken away.  Individual drivers in 'staging' manage to avoid this rule
because that are clearly separate things.
New system calls and attributes in sysfs etc seem to be much harder to
"partially" release.

To quote from Linus in a recent interview:

"I personally think the stable development model is not one of
continual incremental improvements, but a succession of overshooting and crashing." 

Yet Linux is stuck in "incremental improvement" mode and is not in a position
to "overshoot and crash" much.

I agree there is a problem.  I can't see a solution.

NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Ksummit-discuss] [CORE TOPIC] [nomination] Move Fast and Oops Things
  2014-05-21  8:25       ` NeilBrown
@ 2014-05-21  8:36         ` Dan Williams
  2014-05-21  8:53           ` Matt Fleming
  2014-05-21 10:11           ` NeilBrown
  0 siblings, 2 replies; 38+ messages in thread
From: Dan Williams @ 2014-05-21  8:36 UTC (permalink / raw)
  To: NeilBrown; +Cc: ksummit-discuss

On Wed, May 21, 2014 at 1:25 AM, NeilBrown <neilb@suse.de> wrote:
> On Wed, 21 May 2014 00:48:48 -0700 Dan Williams <dan.j.williams@intel.com>
> wrote:
>
>> On Fri, May 16, 2014 at 8:04 AM, Chris Mason <clm@fb.com> wrote:
>> > -----BEGIN PGP SIGNED MESSAGE-----
>> > Hash: SHA1
>> >
>> > On 05/15/2014 10:56 PM, NeilBrown wrote:
>> >> On Thu, 15 May 2014 16:13:58 -0700 Dan Williams
>> >> <dan.j.williams@gmail.com> wrote:
>> >>
>> >>> What would it take and would we even consider moving 2x faster
>> >>> than we are now?
>> >>
>> >> Hi Dan, you seem to be suggesting that there is some limit other
>> >> than "competent engineering time" which is slowing Linux "progress"
>> >> down.
>> >>
>> >> Are you really suggesting that?  What might these other limits be?
>> >>
>> >> Certainly there are limits to minimum gap between conceptualisation
>> >> and release (at least one release cycle), but is there really a
>> >> limit to the parallelism that can be achieved?
>> >
>> > I haven't compared the FB commit rates with the kernel, but I'll
>> > pretend Dan's basic thesis is right and talk about which parts of the
>> > facebook model may move faster than the kernel.
>> >
>> > The facebook is pretty similar to the way the kernel works.  The merge
>> > window lasts a few days and the major releases are every week, but
>> > overall it isn't too far away.
>> >
>> > The biggest difference is that we have a centralized tool for
>> > reviewing the patches, and once it has been reviewed by a specific
>> > number of people, you push it in.
>> >
>> > The patch submission tool runs the patch through lint and various
>> > static analysis to make sure it follows proper coding style and
>> > doesn't include patterns of known bugs.  This cuts down on the review
>> > work because the silly coding style mistakes are gone before it gets
>> > to the tool.
>> >
>> > When you put in a patch, you have to put in reviewers, and they get a
>> > little notification that your patch needs review.  Once the reviewers
>> > are happy, you push the patch in.
>> >
>> > The biggest difference: there are no maintainers.  If I want to go
>> > change the calendar tool to fix a bug, I patch it, get someone else to
>> > sign off and push.
>> >
>> > All of which is my way of saying the maintainers (me included) are the
>> > biggest bottleneck.  There are a lot of reasons I think the maintainer
>> > model fits the kernel better, but at least for btrfs I'm trying to
>> > speed up the patch review process and use patchwork more effectively.
>>
>> To be clear, I'm not arguing for a maintainer-less model.  We don't
>> have the tooling or operational-data to support that.  We need
>> maintainers to say "no".  But, what I think we can do is give
>> maintainers more varied ways to say it.  The goal, de-escalate the
>> merge event as a declaration that the code quality/architecture
>> conversation is over.
>>
>> Release early, release often, and with care merge often.
>
> I think this falls foul of the "no regressions" rule.
>
> The kernel policy is that once the functionality gets to users, it cannot be
> taken away.  Individual drivers in 'staging' manage to avoid this rule
> because that are clearly separate things.
> New system calls and attributes in sysfs etc seem to be much harder to
> "partially" release.

My straw man is something like the following for driver "foo"

if (gatekeeper_foo_new_awesome_sauce)
   do_new_thing();

Where setting gatekeeper_foo_new_awesome_sauce taints the kernel and
warns that there is no guarantee of this functionality being present
in the same form or at all going forward.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Ksummit-discuss] [CORE TOPIC] [nomination] Move Fast and Oops Things
  2014-05-21  8:36         ` Dan Williams
@ 2014-05-21  8:53           ` Matt Fleming
  2014-05-21 10:11           ` NeilBrown
  1 sibling, 0 replies; 38+ messages in thread
From: Matt Fleming @ 2014-05-21  8:53 UTC (permalink / raw)
  To: Dan Williams; +Cc: ksummit-discuss

On Wed, 21 May, at 01:36:55AM, Dan Williams wrote:
> 
> My straw man is something like the following for driver "foo"
> 
> if (gatekeeper_foo_new_awesome_sauce)
>    do_new_thing();
> 
> Where setting gatekeeper_foo_new_awesome_sauce taints the kernel and
> warns that there is no guarantee of this functionality being present
> in the same form or at all going forward.

This kind of thing is done all the time for web developemnt - I think
it's given the name "feature bit".

It makes sense when you control the execution environment, like a web
server, and if things explode you can detect that on the web server end,
and not necessarily require your user to report the problem. It also
makes a lot of sense for continuous deployment, where the master branch
is always the branch used in production.

When a user needs to actively enable this feature and report problems
it's just like another CONFIG_* option, and I'm not sure that's an
improvement.

-- 
Matt Fleming, Intel Open Source Technology Center

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Ksummit-discuss] [CORE TOPIC] [nomination] Move Fast and Oops Things
  2014-05-21  8:36         ` Dan Williams
  2014-05-21  8:53           ` Matt Fleming
@ 2014-05-21 10:11           ` NeilBrown
  2014-05-21 15:35             ` Dan Williams
  1 sibling, 1 reply; 38+ messages in thread
From: NeilBrown @ 2014-05-21 10:11 UTC (permalink / raw)
  To: Dan Williams; +Cc: ksummit-discuss

[-- Attachment #1: Type: text/plain, Size: 5584 bytes --]

On Wed, 21 May 2014 01:36:55 -0700 Dan Williams <dan.j.williams@intel.com>
wrote:

> On Wed, May 21, 2014 at 1:25 AM, NeilBrown <neilb@suse.de> wrote:
> > On Wed, 21 May 2014 00:48:48 -0700 Dan Williams <dan.j.williams@intel.com>
> > wrote:
> >
> >> On Fri, May 16, 2014 at 8:04 AM, Chris Mason <clm@fb.com> wrote:
> >> > -----BEGIN PGP SIGNED MESSAGE-----
> >> > Hash: SHA1
> >> >
> >> > On 05/15/2014 10:56 PM, NeilBrown wrote:
> >> >> On Thu, 15 May 2014 16:13:58 -0700 Dan Williams
> >> >> <dan.j.williams@gmail.com> wrote:
> >> >>
> >> >>> What would it take and would we even consider moving 2x faster
> >> >>> than we are now?
> >> >>
> >> >> Hi Dan, you seem to be suggesting that there is some limit other
> >> >> than "competent engineering time" which is slowing Linux "progress"
> >> >> down.
> >> >>
> >> >> Are you really suggesting that?  What might these other limits be?
> >> >>
> >> >> Certainly there are limits to minimum gap between conceptualisation
> >> >> and release (at least one release cycle), but is there really a
> >> >> limit to the parallelism that can be achieved?
> >> >
> >> > I haven't compared the FB commit rates with the kernel, but I'll
> >> > pretend Dan's basic thesis is right and talk about which parts of the
> >> > facebook model may move faster than the kernel.
> >> >
> >> > The facebook is pretty similar to the way the kernel works.  The merge
> >> > window lasts a few days and the major releases are every week, but
> >> > overall it isn't too far away.
> >> >
> >> > The biggest difference is that we have a centralized tool for
> >> > reviewing the patches, and once it has been reviewed by a specific
> >> > number of people, you push it in.
> >> >
> >> > The patch submission tool runs the patch through lint and various
> >> > static analysis to make sure it follows proper coding style and
> >> > doesn't include patterns of known bugs.  This cuts down on the review
> >> > work because the silly coding style mistakes are gone before it gets
> >> > to the tool.
> >> >
> >> > When you put in a patch, you have to put in reviewers, and they get a
> >> > little notification that your patch needs review.  Once the reviewers
> >> > are happy, you push the patch in.
> >> >
> >> > The biggest difference: there are no maintainers.  If I want to go
> >> > change the calendar tool to fix a bug, I patch it, get someone else to
> >> > sign off and push.
> >> >
> >> > All of which is my way of saying the maintainers (me included) are the
> >> > biggest bottleneck.  There are a lot of reasons I think the maintainer
> >> > model fits the kernel better, but at least for btrfs I'm trying to
> >> > speed up the patch review process and use patchwork more effectively.
> >>
> >> To be clear, I'm not arguing for a maintainer-less model.  We don't
> >> have the tooling or operational-data to support that.  We need
> >> maintainers to say "no".  But, what I think we can do is give
> >> maintainers more varied ways to say it.  The goal, de-escalate the
> >> merge event as a declaration that the code quality/architecture
> >> conversation is over.
> >>
> >> Release early, release often, and with care merge often.
> >
> > I think this falls foul of the "no regressions" rule.
> >
> > The kernel policy is that once the functionality gets to users, it cannot be
> > taken away.  Individual drivers in 'staging' manage to avoid this rule
> > because that are clearly separate things.
> > New system calls and attributes in sysfs etc seem to be much harder to
> > "partially" release.
> 
> My straw man is something like the following for driver "foo"
> 
> if (gatekeeper_foo_new_awesome_sauce)
>    do_new_thing();
> 
> Where setting gatekeeper_foo_new_awesome_sauce taints the kernel and
> warns that there is no guarantee of this functionality being present
> in the same form or at all going forward.

Interesting idea.
Trying to imagine how this might play out in practice....

You talk about "value delivered to users".   But users tend to use
applications, and applications are the users of kernel features.

Will anyone bother writing or adapting an application to use a feature which
is not guaranteed to hang around?
Maybe they will, but will the users of the application know that it might
stop working after a kernel upgrade?  Maybe...

Maybe if we had some concrete examples of features that could have been
delayed using a gatekeeper.

The one that springs to my mind is cgroups.  Clearly useful, but clearly
controversial.  It appears that the original implementation was seriously
flawed and Tejun is doing a massive amount of work to "fix" it, and this
apparently will lead to API changes.  And this is happening without any
gatekeepers.  Would it have been easier in some way with gatekeepers?
... I don't see how it would be, except that fewer people would have used
cgroups, and then maybe we wouldn't have as much collective experience to
know what the real problems were(?).

I think that is the key.  With a user-facing option, people will try it and
probably cope if it disappears (though they might complain loudly and sign
petitions declaring facebook to be the anti-$DEITY).  However  with kernel
internal options, applications are unlikely to use them without some
expectation of stability.  So finding the problems would be a lot harder.

Which doesn't mean that it can't work, but it would be nice if create some
real life examples to see how it plays out in practice.

NeilBrown


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Ksummit-discuss] [CORE TOPIC] [nomination] Move Fast and Oops Things
  2014-05-21 10:11           ` NeilBrown
@ 2014-05-21 15:35             ` Dan Williams
  2014-05-21 23:06               ` Rafael J. Wysocki
  2014-05-21 23:48               ` NeilBrown
  0 siblings, 2 replies; 38+ messages in thread
From: Dan Williams @ 2014-05-21 15:35 UTC (permalink / raw)
  To: NeilBrown; +Cc: ksummit-discuss

On Wed, May 21, 2014 at 3:11 AM, NeilBrown <neilb@suse.de> wrote:
> On Wed, 21 May 2014 01:36:55 -0700 Dan Williams <dan.j.williams@intel.com>
> wrote:
>
>> On Wed, May 21, 2014 at 1:25 AM, NeilBrown <neilb@suse.de> wrote:
>> > On Wed, 21 May 2014 00:48:48 -0700 Dan Williams <dan.j.williams@intel.com>
>> > wrote:
>> >
>> >> On Fri, May 16, 2014 at 8:04 AM, Chris Mason <clm@fb.com> wrote:
>> >> > -----BEGIN PGP SIGNED MESSAGE-----
>> >> > Hash: SHA1
>> >> >
>> >> > On 05/15/2014 10:56 PM, NeilBrown wrote:
>> >> >> On Thu, 15 May 2014 16:13:58 -0700 Dan Williams
>> >> >> <dan.j.williams@gmail.com> wrote:
>> >> >>
>> >> >>> What would it take and would we even consider moving 2x faster
>> >> >>> than we are now?
>> >> >>
>> >> >> Hi Dan, you seem to be suggesting that there is some limit other
>> >> >> than "competent engineering time" which is slowing Linux "progress"
>> >> >> down.
>> >> >>
>> >> >> Are you really suggesting that?  What might these other limits be?
>> >> >>
>> >> >> Certainly there are limits to minimum gap between conceptualisation
>> >> >> and release (at least one release cycle), but is there really a
>> >> >> limit to the parallelism that can be achieved?
>> >> >
>> >> > I haven't compared the FB commit rates with the kernel, but I'll
>> >> > pretend Dan's basic thesis is right and talk about which parts of the
>> >> > facebook model may move faster than the kernel.
>> >> >
>> >> > The facebook is pretty similar to the way the kernel works.  The merge
>> >> > window lasts a few days and the major releases are every week, but
>> >> > overall it isn't too far away.
>> >> >
>> >> > The biggest difference is that we have a centralized tool for
>> >> > reviewing the patches, and once it has been reviewed by a specific
>> >> > number of people, you push it in.
>> >> >
>> >> > The patch submission tool runs the patch through lint and various
>> >> > static analysis to make sure it follows proper coding style and
>> >> > doesn't include patterns of known bugs.  This cuts down on the review
>> >> > work because the silly coding style mistakes are gone before it gets
>> >> > to the tool.
>> >> >
>> >> > When you put in a patch, you have to put in reviewers, and they get a
>> >> > little notification that your patch needs review.  Once the reviewers
>> >> > are happy, you push the patch in.
>> >> >
>> >> > The biggest difference: there are no maintainers.  If I want to go
>> >> > change the calendar tool to fix a bug, I patch it, get someone else to
>> >> > sign off and push.
>> >> >
>> >> > All of which is my way of saying the maintainers (me included) are the
>> >> > biggest bottleneck.  There are a lot of reasons I think the maintainer
>> >> > model fits the kernel better, but at least for btrfs I'm trying to
>> >> > speed up the patch review process and use patchwork more effectively.
>> >>
>> >> To be clear, I'm not arguing for a maintainer-less model.  We don't
>> >> have the tooling or operational-data to support that.  We need
>> >> maintainers to say "no".  But, what I think we can do is give
>> >> maintainers more varied ways to say it.  The goal, de-escalate the
>> >> merge event as a declaration that the code quality/architecture
>> >> conversation is over.
>> >>
>> >> Release early, release often, and with care merge often.
>> >
>> > I think this falls foul of the "no regressions" rule.
>> >
>> > The kernel policy is that once the functionality gets to users, it cannot be
>> > taken away.  Individual drivers in 'staging' manage to avoid this rule
>> > because that are clearly separate things.
>> > New system calls and attributes in sysfs etc seem to be much harder to
>> > "partially" release.
>>
>> My straw man is something like the following for driver "foo"
>>
>> if (gatekeeper_foo_new_awesome_sauce)
>>    do_new_thing();
>>
>> Where setting gatekeeper_foo_new_awesome_sauce taints the kernel and
>> warns that there is no guarantee of this functionality being present
>> in the same form or at all going forward.
>
> Interesting idea.
> Trying to imagine how this might play out in practice....
>
> You talk about "value delivered to users".   But users tend to use
> applications, and applications are the users of kernel features.
>
> Will anyone bother writing or adapting an application to use a feature which
> is not guaranteed to hang around?
> Maybe they will, but will the users of the application know that it might
> stop working after a kernel upgrade?  Maybe...
>
> Maybe if we had some concrete examples of features that could have been
> delayed using a gatekeeper.
>
> The one that springs to my mind is cgroups.  Clearly useful, but clearly
> controversial.  It appears that the original implementation was seriously
> flawed and Tejun is doing a massive amount of work to "fix" it, and this
> apparently will lead to API changes.  And this is happening without any
> gatekeepers.  Would it have been easier in some way with gatekeepers?
> ... I don't see how it would be, except that fewer people would have used
> cgroups, and then maybe we wouldn't have as much collective experience to
> know what the real problems were(?).
>
> I think that is the key.  With a user-facing option, people will try it and
> probably cope if it disappears (though they might complain loudly and sign
> petitions declaring facebook to be the anti-$DEITY).  However  with kernel
> internal options, applications are unlikely to use them without some
> expectation of stability.  So finding the problems would be a lot harder.
>
> Which doesn't mean that it can't work, but it would be nice if create some
> real life examples to see how it plays out in practice.
>

Biased by my background of course, but I think driver development is
more amenable to this sort of approach.  For drivers the kernel is in
many instances the application.  For example, I currently have in my
review queue a patch set to add sata port multiplier support to
libsas.  I hope I get the review done in time for merging it in 3.16.
But, what if I also had the option of saying "let's gatekeeper this
for a cycle".  Users that care could start using it and reporting
bugs, and it would be clear that the implementation is provisional.
My opinion is that bug reports would attract deeper code review that
otherwise would not occur if the feature was simply delayed for a
cycle.

I think I also would have liked to use a gatekeeper to stage the
deletion of NET_DMA from the kernel.  Mark it for removal, see who
screams, but still make it straightforward for such people to make
their case with data why the value should stay.

For the core kernel, which I admittedly have not touched much, are
there cases where an application wants to make a value argument to
users, but needs some kernel infrastructure to stand on?  Do we
inadvertently stifle otherwise promising experiments by forcing
upstream acceptance before the experiment gets the exposure it needs?

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Ksummit-discuss] [CORE TOPIC] [nomination] Move Fast and Oops Things
  2014-05-21 15:35             ` Dan Williams
@ 2014-05-21 23:06               ` Rafael J. Wysocki
  2014-05-21 23:03                 ` Dan Williams
  2014-05-21 23:48               ` NeilBrown
  1 sibling, 1 reply; 38+ messages in thread
From: Rafael J. Wysocki @ 2014-05-21 23:06 UTC (permalink / raw)
  To: Dan Williams; +Cc: ksummit-discuss

On Wednesday, May 21, 2014 08:35:55 AM Dan Williams wrote:
> On Wed, May 21, 2014 at 3:11 AM, NeilBrown <neilb@suse.de> wrote:
> > On Wed, 21 May 2014 01:36:55 -0700 Dan Williams <dan.j.williams@intel.com>
> > wrote:
> >
> >> On Wed, May 21, 2014 at 1:25 AM, NeilBrown <neilb@suse.de> wrote:
> >> > On Wed, 21 May 2014 00:48:48 -0700 Dan Williams <dan.j.williams@intel.com>
> >> > wrote:
> >> >
> >> >> On Fri, May 16, 2014 at 8:04 AM, Chris Mason <clm@fb.com> wrote:
> >> >> > -----BEGIN PGP SIGNED MESSAGE-----
> >> >> > Hash: SHA1
> >> >> >
> >> >> > On 05/15/2014 10:56 PM, NeilBrown wrote:
> >> >> >> On Thu, 15 May 2014 16:13:58 -0700 Dan Williams
> >> >> >> <dan.j.williams@gmail.com> wrote:
> >> >> >>
> >> >> >>> What would it take and would we even consider moving 2x faster
> >> >> >>> than we are now?
> >> >> >>
> >> >> >> Hi Dan, you seem to be suggesting that there is some limit other
> >> >> >> than "competent engineering time" which is slowing Linux "progress"
> >> >> >> down.
> >> >> >>
> >> >> >> Are you really suggesting that?  What might these other limits be?
> >> >> >>
> >> >> >> Certainly there are limits to minimum gap between conceptualisation
> >> >> >> and release (at least one release cycle), but is there really a
> >> >> >> limit to the parallelism that can be achieved?
> >> >> >
> >> >> > I haven't compared the FB commit rates with the kernel, but I'll
> >> >> > pretend Dan's basic thesis is right and talk about which parts of the
> >> >> > facebook model may move faster than the kernel.
> >> >> >
> >> >> > The facebook is pretty similar to the way the kernel works.  The merge
> >> >> > window lasts a few days and the major releases are every week, but
> >> >> > overall it isn't too far away.
> >> >> >
> >> >> > The biggest difference is that we have a centralized tool for
> >> >> > reviewing the patches, and once it has been reviewed by a specific
> >> >> > number of people, you push it in.
> >> >> >
> >> >> > The patch submission tool runs the patch through lint and various
> >> >> > static analysis to make sure it follows proper coding style and
> >> >> > doesn't include patterns of known bugs.  This cuts down on the review
> >> >> > work because the silly coding style mistakes are gone before it gets
> >> >> > to the tool.
> >> >> >
> >> >> > When you put in a patch, you have to put in reviewers, and they get a
> >> >> > little notification that your patch needs review.  Once the reviewers
> >> >> > are happy, you push the patch in.
> >> >> >
> >> >> > The biggest difference: there are no maintainers.  If I want to go
> >> >> > change the calendar tool to fix a bug, I patch it, get someone else to
> >> >> > sign off and push.
> >> >> >
> >> >> > All of which is my way of saying the maintainers (me included) are the
> >> >> > biggest bottleneck.  There are a lot of reasons I think the maintainer
> >> >> > model fits the kernel better, but at least for btrfs I'm trying to
> >> >> > speed up the patch review process and use patchwork more effectively.
> >> >>
> >> >> To be clear, I'm not arguing for a maintainer-less model.  We don't
> >> >> have the tooling or operational-data to support that.  We need
> >> >> maintainers to say "no".  But, what I think we can do is give
> >> >> maintainers more varied ways to say it.  The goal, de-escalate the
> >> >> merge event as a declaration that the code quality/architecture
> >> >> conversation is over.
> >> >>
> >> >> Release early, release often, and with care merge often.
> >> >
> >> > I think this falls foul of the "no regressions" rule.
> >> >
> >> > The kernel policy is that once the functionality gets to users, it cannot be
> >> > taken away.  Individual drivers in 'staging' manage to avoid this rule
> >> > because that are clearly separate things.
> >> > New system calls and attributes in sysfs etc seem to be much harder to
> >> > "partially" release.
> >>
> >> My straw man is something like the following for driver "foo"
> >>
> >> if (gatekeeper_foo_new_awesome_sauce)
> >>    do_new_thing();
> >>
> >> Where setting gatekeeper_foo_new_awesome_sauce taints the kernel and
> >> warns that there is no guarantee of this functionality being present
> >> in the same form or at all going forward.
> >
> > Interesting idea.
> > Trying to imagine how this might play out in practice....
> >
> > You talk about "value delivered to users".   But users tend to use
> > applications, and applications are the users of kernel features.
> >
> > Will anyone bother writing or adapting an application to use a feature which
> > is not guaranteed to hang around?
> > Maybe they will, but will the users of the application know that it might
> > stop working after a kernel upgrade?  Maybe...
> >
> > Maybe if we had some concrete examples of features that could have been
> > delayed using a gatekeeper.
> >
> > The one that springs to my mind is cgroups.  Clearly useful, but clearly
> > controversial.  It appears that the original implementation was seriously
> > flawed and Tejun is doing a massive amount of work to "fix" it, and this
> > apparently will lead to API changes.  And this is happening without any
> > gatekeepers.  Would it have been easier in some way with gatekeepers?
> > ... I don't see how it would be, except that fewer people would have used
> > cgroups, and then maybe we wouldn't have as much collective experience to
> > know what the real problems were(?).
> >
> > I think that is the key.  With a user-facing option, people will try it and
> > probably cope if it disappears (though they might complain loudly and sign
> > petitions declaring facebook to be the anti-$DEITY).  However  with kernel
> > internal options, applications are unlikely to use them without some
> > expectation of stability.  So finding the problems would be a lot harder.
> >
> > Which doesn't mean that it can't work, but it would be nice if create some
> > real life examples to see how it plays out in practice.
> >
> 
> Biased by my background of course, but I think driver development is
> more amenable to this sort of approach.  For drivers the kernel is in
> many instances the application.  For example, I currently have in my
> review queue a patch set to add sata port multiplier support to
> libsas.  I hope I get the review done in time for merging it in 3.16.
> But, what if I also had the option of saying "let's gatekeeper this
> for a cycle".  Users that care could start using it and reporting
> bugs, and it would be clear that the implementation is provisional.
> My opinion is that bug reports would attract deeper code review that
> otherwise would not occur if the feature was simply delayed for a
> cycle.

There's more to that.

The model you're referring to is only possible if all participants are
employees of one company or otherwise members of one organization that
has some kind of control over them.  The kernel development is not done
like that, though, so I'm afraid that the Facebook experience is not
applicable here directly.

For example, we take patches from pretty much everyone on the Internet.
Does Facebook do that too?  I don't think so.

Thanks!

-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Ksummit-discuss] [CORE TOPIC] [nomination] Move Fast and Oops Things
  2014-05-21 23:06               ` Rafael J. Wysocki
@ 2014-05-21 23:03                 ` Dan Williams
  2014-05-21 23:40                   ` Laurent Pinchart
                                     ` (2 more replies)
  0 siblings, 3 replies; 38+ messages in thread
From: Dan Williams @ 2014-05-21 23:03 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: ksummit-discuss

On Wed, May 21, 2014 at 4:06 PM, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> On Wednesday, May 21, 2014 08:35:55 AM Dan Williams wrote:
>> On Wed, May 21, 2014 at 3:11 AM, NeilBrown <neilb@suse.de> wrote:
>> > On Wed, 21 May 2014 01:36:55 -0700 Dan Williams <dan.j.williams@intel.com>
>> > wrote:
>> >
>> >> On Wed, May 21, 2014 at 1:25 AM, NeilBrown <neilb@suse.de> wrote:
>> >> > On Wed, 21 May 2014 00:48:48 -0700 Dan Williams <dan.j.williams@intel.com>
>> >> > wrote:
>> >> >
>> >> >> On Fri, May 16, 2014 at 8:04 AM, Chris Mason <clm@fb.com> wrote:
>> >> >> > -----BEGIN PGP SIGNED MESSAGE-----
>> >> >> > Hash: SHA1
>> >> >> >
>> >> >> > On 05/15/2014 10:56 PM, NeilBrown wrote:
>> >> >> >> On Thu, 15 May 2014 16:13:58 -0700 Dan Williams
>> >> >> >> <dan.j.williams@gmail.com> wrote:
>> >> >> >>
>> >> >> >>> What would it take and would we even consider moving 2x faster
>> >> >> >>> than we are now?
>> >> >> >>
>> >> >> >> Hi Dan, you seem to be suggesting that there is some limit other
>> >> >> >> than "competent engineering time" which is slowing Linux "progress"
>> >> >> >> down.
>> >> >> >>
>> >> >> >> Are you really suggesting that?  What might these other limits be?
>> >> >> >>
>> >> >> >> Certainly there are limits to minimum gap between conceptualisation
>> >> >> >> and release (at least one release cycle), but is there really a
>> >> >> >> limit to the parallelism that can be achieved?
>> >> >> >
>> >> >> > I haven't compared the FB commit rates with the kernel, but I'll
>> >> >> > pretend Dan's basic thesis is right and talk about which parts of the
>> >> >> > facebook model may move faster than the kernel.
>> >> >> >
>> >> >> > The facebook is pretty similar to the way the kernel works.  The merge
>> >> >> > window lasts a few days and the major releases are every week, but
>> >> >> > overall it isn't too far away.
>> >> >> >
>> >> >> > The biggest difference is that we have a centralized tool for
>> >> >> > reviewing the patches, and once it has been reviewed by a specific
>> >> >> > number of people, you push it in.
>> >> >> >
>> >> >> > The patch submission tool runs the patch through lint and various
>> >> >> > static analysis to make sure it follows proper coding style and
>> >> >> > doesn't include patterns of known bugs.  This cuts down on the review
>> >> >> > work because the silly coding style mistakes are gone before it gets
>> >> >> > to the tool.
>> >> >> >
>> >> >> > When you put in a patch, you have to put in reviewers, and they get a
>> >> >> > little notification that your patch needs review.  Once the reviewers
>> >> >> > are happy, you push the patch in.
>> >> >> >
>> >> >> > The biggest difference: there are no maintainers.  If I want to go
>> >> >> > change the calendar tool to fix a bug, I patch it, get someone else to
>> >> >> > sign off and push.
>> >> >> >
>> >> >> > All of which is my way of saying the maintainers (me included) are the
>> >> >> > biggest bottleneck.  There are a lot of reasons I think the maintainer
>> >> >> > model fits the kernel better, but at least for btrfs I'm trying to
>> >> >> > speed up the patch review process and use patchwork more effectively.
>> >> >>
>> >> >> To be clear, I'm not arguing for a maintainer-less model.  We don't
>> >> >> have the tooling or operational-data to support that.  We need
>> >> >> maintainers to say "no".  But, what I think we can do is give
>> >> >> maintainers more varied ways to say it.  The goal, de-escalate the
>> >> >> merge event as a declaration that the code quality/architecture
>> >> >> conversation is over.
>> >> >>
>> >> >> Release early, release often, and with care merge often.
>> >> >
>> >> > I think this falls foul of the "no regressions" rule.
>> >> >
>> >> > The kernel policy is that once the functionality gets to users, it cannot be
>> >> > taken away.  Individual drivers in 'staging' manage to avoid this rule
>> >> > because that are clearly separate things.
>> >> > New system calls and attributes in sysfs etc seem to be much harder to
>> >> > "partially" release.
>> >>
>> >> My straw man is something like the following for driver "foo"
>> >>
>> >> if (gatekeeper_foo_new_awesome_sauce)
>> >>    do_new_thing();
>> >>
>> >> Where setting gatekeeper_foo_new_awesome_sauce taints the kernel and
>> >> warns that there is no guarantee of this functionality being present
>> >> in the same form or at all going forward.
>> >
>> > Interesting idea.
>> > Trying to imagine how this might play out in practice....
>> >
>> > You talk about "value delivered to users".   But users tend to use
>> > applications, and applications are the users of kernel features.
>> >
>> > Will anyone bother writing or adapting an application to use a feature which
>> > is not guaranteed to hang around?
>> > Maybe they will, but will the users of the application know that it might
>> > stop working after a kernel upgrade?  Maybe...
>> >
>> > Maybe if we had some concrete examples of features that could have been
>> > delayed using a gatekeeper.
>> >
>> > The one that springs to my mind is cgroups.  Clearly useful, but clearly
>> > controversial.  It appears that the original implementation was seriously
>> > flawed and Tejun is doing a massive amount of work to "fix" it, and this
>> > apparently will lead to API changes.  And this is happening without any
>> > gatekeepers.  Would it have been easier in some way with gatekeepers?
>> > ... I don't see how it would be, except that fewer people would have used
>> > cgroups, and then maybe we wouldn't have as much collective experience to
>> > know what the real problems were(?).
>> >
>> > I think that is the key.  With a user-facing option, people will try it and
>> > probably cope if it disappears (though they might complain loudly and sign
>> > petitions declaring facebook to be the anti-$DEITY).  However  with kernel
>> > internal options, applications are unlikely to use them without some
>> > expectation of stability.  So finding the problems would be a lot harder.
>> >
>> > Which doesn't mean that it can't work, but it would be nice if create some
>> > real life examples to see how it plays out in practice.
>> >
>>
>> Biased by my background of course, but I think driver development is
>> more amenable to this sort of approach.  For drivers the kernel is in
>> many instances the application.  For example, I currently have in my
>> review queue a patch set to add sata port multiplier support to
>> libsas.  I hope I get the review done in time for merging it in 3.16.
>> But, what if I also had the option of saying "let's gatekeeper this
>> for a cycle".  Users that care could start using it and reporting
>> bugs, and it would be clear that the implementation is provisional.
>> My opinion is that bug reports would attract deeper code review that
>> otherwise would not occur if the feature was simply delayed for a
>> cycle.
>
> There's more to that.
>
> The model you're referring to is only possible if all participants are
> employees of one company or otherwise members of one organization that
> has some kind of control over them.  The kernel development is not done
> like that, though, so I'm afraid that the Facebook experience is not
> applicable here directly.
>
> For example, we take patches from pretty much everyone on the Internet.
> Does Facebook do that too?  I don't think so.
>

I'm struggling to see how this addresses my new libsas feature example?

Simply, if an end user knows how to override a "gatekeeper" that user
can test features that we are otherwise still debating upstream.  They
can of course also apply the patches directly, but I am proposing we
formalize a mechanism to encourage more experimentation in-tree.

I'm fully aware we do not have the tactical data nor operational
control to run the kernel like a website, that's not my concern.  My
concern is with expanding a maintainer's options for mitigating risk.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Ksummit-discuss] [CORE TOPIC] [nomination] Move Fast and Oops Things
  2014-05-21 23:03                 ` Dan Williams
@ 2014-05-21 23:40                   ` Laurent Pinchart
  2014-05-22  0:10                   ` Rafael J. Wysocki
  2014-05-22 15:48                   ` Theodore Ts'o
  2 siblings, 0 replies; 38+ messages in thread
From: Laurent Pinchart @ 2014-05-21 23:40 UTC (permalink / raw)
  To: ksummit-discuss

On Wednesday 21 May 2014 16:03:49 Dan Williams wrote:
> On Wed, May 21, 2014 at 4:06 PM, Rafael J. Wysocki wrote:
> > On Wednesday, May 21, 2014 08:35:55 AM Dan Williams wrote:
> >> On Wed, May 21, 2014 at 3:11 AM, NeilBrown wrote:
> >> > On Wed, 21 May 2014 01:36:55 -0700 Dan Williams wrote:
> >> >> On Wed, May 21, 2014 at 1:25 AM, NeilBrown wrote:
> >> >> > On Wed, 21 May 2014 00:48:48 -0700 Dan Williams wrote:
> >> >> >> On Fri, May 16, 2014 at 8:04 AM, Chris Mason <clm@fb.com> wrote:
> >> >> >> > -----BEGIN PGP SIGNED MESSAGE-----
> >> >> >> > Hash: SHA1
> >> >> >> > 
> >> >> >> > On 05/15/2014 10:56 PM, NeilBrown wrote:
> >> >> >> >> On Thu, 15 May 2014 16:13:58 -0700 Dan Williams wrote:
> >> >> >> >>> What would it take and would we even consider moving 2x faster
> >> >> >> >>> than we are now?
> >> >> >> >> 
> >> >> >> >> Hi Dan, you seem to be suggesting that there is some limit other
> >> >> >> >> than "competent engineering time" which is slowing Linux
> >> >> >> >> "progress" down.
> >> >> >> >> 
> >> >> >> >> Are you really suggesting that?  What might these other limits
> >> >> >> >> be?
> >> >> >> >> 
> >> >> >> >> Certainly there are limits to minimum gap between
> >> >> >> >> conceptualisation and release (at least one release cycle), but
> >> >> >> >> is there really a limit to the parallelism that can be achieved?
> >> >> >> > 
> >> >> >> > I haven't compared the FB commit rates with the kernel, but I'll
> >> >> >> > pretend Dan's basic thesis is right and talk about which parts of
> >> >> >> > the facebook model may move faster than the kernel.
> >> >> >> > 
> >> >> >> > The facebook is pretty similar to the way the kernel works.  The
> >> >> >> > merge window lasts a few days and the major releases are every
> >> >> >> > week, but overall it isn't too far away.
> >> >> >> > 
> >> >> >> > The biggest difference is that we have a centralized tool for
> >> >> >> > reviewing the patches, and once it has been reviewed by a
> >> >> >> > specific number of people, you push it in.
> >> >> >> > 
> >> >> >> > The patch submission tool runs the patch through lint and various
> >> >> >> > static analysis to make sure it follows proper coding style and
> >> >> >> > doesn't include patterns of known bugs.  This cuts down on the
> >> >> >> > review work because the silly coding style mistakes are gone
> >> >> >> > before it gets to the tool.
> >> >> >> > 
> >> >> >> > When you put in a patch, you have to put in reviewers, and they
> >> >> >> > get a little notification that your patch needs review.  Once the
> >> >> >> > reviewers are happy, you push the patch in.
> >> >> >> > 
> >> >> >> > The biggest difference: there are no maintainers.  If I want to
> >> >> >> > go change the calendar tool to fix a bug, I patch it, get someone
> >> >> >> > else to sign off and push.
> >> >> >> > 
> >> >> >> > All of which is my way of saying the maintainers (me included)
> >> >> >> > are the biggest bottleneck.  There are a lot of reasons I think
> >> >> >> > the maintainer model fits the kernel better, but at least for
> >> >> >> > btrfs I'm trying to speed up the patch review process and use
> >> >> >> > patchwork more effectively.
> >> >> >> 
> >> >> >> To be clear, I'm not arguing for a maintainer-less model.  We don't
> >> >> >> have the tooling or operational-data to support that.  We need
> >> >> >> maintainers to say "no".  But, what I think we can do is give
> >> >> >> maintainers more varied ways to say it.  The goal, de-escalate the
> >> >> >> merge event as a declaration that the code quality/architecture
> >> >> >> conversation is over.
> >> >> >> 
> >> >> >> Release early, release often, and with care merge often.
> >> >> > 
> >> >> > I think this falls foul of the "no regressions" rule.
> >> >> > 
> >> >> > The kernel policy is that once the functionality gets to users, it
> >> >> > cannot be taken away.  Individual drivers in 'staging' manage to
> >> >> > avoid this rule because that are clearly separate things.
> >> >> > New system calls and attributes in sysfs etc seem to be much harder
> >> >> > to "partially" release.
> >> >> 
> >> >> My straw man is something like the following for driver "foo"
> >> >> 
> >> >> if (gatekeeper_foo_new_awesome_sauce)
> >> >> 
> >> >>    do_new_thing();
> >> >> 
> >> >> Where setting gatekeeper_foo_new_awesome_sauce taints the kernel and
> >> >> warns that there is no guarantee of this functionality being present
> >> >> in the same form or at all going forward.
> >> > 
> >> > Interesting idea.
> >> > Trying to imagine how this might play out in practice....
> >> > 
> >> > You talk about "value delivered to users".   But users tend to use
> >> > applications, and applications are the users of kernel features.
> >> > 
> >> > Will anyone bother writing or adapting an application to use a feature
> >> > which is not guaranteed to hang around?
> >> > Maybe they will, but will the users of the application know that it
> >> > might stop working after a kernel upgrade?  Maybe...
> >> > 
> >> > Maybe if we had some concrete examples of features that could have been
> >> > delayed using a gatekeeper.
> >> > 
> >> > The one that springs to my mind is cgroups.  Clearly useful, but
> >> > clearly controversial.  It appears that the original implementation was
> >> > seriously flawed and Tejun is doing a massive amount of work to "fix"
> >> > it, and this apparently will lead to API changes.  And this is
> >> > happening without any gatekeepers.  Would it have been easier in some
> >> > way with gatekeepers? ... I don't see how it would be, except that
> >> > fewer people would have used cgroups, and then maybe we wouldn't have
> >> > as much collective experience to know what the real problems were(?).
> >> > 
> >> > I think that is the key.  With a user-facing option, people will try it
> >> > and probably cope if it disappears (though they might complain loudly
> >> > and sign petitions declaring facebook to be the anti-$DEITY).  However 
> >> > with kernel internal options, applications are unlikely to use them
> >> > without some expectation of stability.  So finding the problems would
> >> > be a lot harder.
> >> > 
> >> > Which doesn't mean that it can't work, but it would be nice if create
> >> > some real life examples to see how it plays out in practice.
> >> 
> >> Biased by my background of course, but I think driver development is
> >> more amenable to this sort of approach.  For drivers the kernel is in
> >> many instances the application.  For example, I currently have in my
> >> review queue a patch set to add sata port multiplier support to
> >> libsas.  I hope I get the review done in time for merging it in 3.16.
> >> But, what if I also had the option of saying "let's gatekeeper this
> >> for a cycle".  Users that care could start using it and reporting
> >> bugs, and it would be clear that the implementation is provisional.
> >> My opinion is that bug reports would attract deeper code review that
> >> otherwise would not occur if the feature was simply delayed for a
> >> cycle.
> > 
> > There's more to that.
> > 
> > The model you're referring to is only possible if all participants are
> > employees of one company or otherwise members of one organization that
> > has some kind of control over them.  The kernel development is not done
> > like that, though, so I'm afraid that the Facebook experience is not
> > applicable here directly.
> > 
> > For example, we take patches from pretty much everyone on the Internet.
> > Does Facebook do that too?  I don't think so.
> 
> I'm struggling to see how this addresses my new libsas feature example?
> 
> Simply, if an end user knows how to override a "gatekeeper" that user
> can test features that we are otherwise still debating upstream.  They
> can of course also apply the patches directly, but I am proposing we
> formalize a mechanism to encourage more experimentation in-tree.

Isn't that what CONFIG_EXPERIMENTAL was for ? Putting a similar mechanism in 
place would likely be abused the same way, and end up being enabled by default 
by distros at the end of the day. http://lwn.net/Articles/520867/ explains how 
experimental items should be handled, possibly depending on CONFIG_BROKEN 
(hopefully distros won't enable that one).

Let's not forget that the kernel carries security implication. We might want 
to make it easier for users to enable experimental features, but not so easy 
that they could enable dangerous features without knowing it, or without 
realizing what they're doing. Out-of-tree patches should be pretty safe in 
that regard, an in-tree mechanism should take those constraints into account.

We also need to decide on where to put the limit. Experimental features that 
haven't been properly reviewed can have side effects. They might make build 
robots fail even when the feature is disabled, because the implementation 
doesn't properly handle the disabled case. We would need to review 
experimental patches to prevent that from happening, and that could just put 
more burden on maintainers instead of helping them.
 
> I'm fully aware we do not have the tactical data nor operational
> control to run the kernel like a website, that's not my concern.  My
> concern is with expanding a maintainer's options for mitigating risk.

-- 
Regards,

Laurent Pinchart

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Ksummit-discuss] [CORE TOPIC] [nomination] Move Fast and Oops Things
  2014-05-21 23:03                 ` Dan Williams
  2014-05-21 23:40                   ` Laurent Pinchart
@ 2014-05-22  0:10                   ` Rafael J. Wysocki
  2014-05-22 15:48                   ` Theodore Ts'o
  2 siblings, 0 replies; 38+ messages in thread
From: Rafael J. Wysocki @ 2014-05-22  0:10 UTC (permalink / raw)
  To: Dan Williams; +Cc: ksummit-discuss

On Wednesday, May 21, 2014 04:03:49 PM Dan Williams wrote:
> On Wed, May 21, 2014 at 4:06 PM, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> > On Wednesday, May 21, 2014 08:35:55 AM Dan Williams wrote:
> >> On Wed, May 21, 2014 at 3:11 AM, NeilBrown <neilb@suse.de> wrote:
> >> > On Wed, 21 May 2014 01:36:55 -0700 Dan Williams <dan.j.williams@intel.com>
> >> > wrote:

[cut]

> >
> > There's more to that.
> >
> > The model you're referring to is only possible if all participants are
> > employees of one company or otherwise members of one organization that
> > has some kind of control over them.  The kernel development is not done
> > like that, though, so I'm afraid that the Facebook experience is not
> > applicable here directly.
> >
> > For example, we take patches from pretty much everyone on the Internet.
> > Does Facebook do that too?  I don't think so.
> >
> 
> I'm struggling to see how this addresses my new libsas feature example?

What about security?  What about preventing distros from shipping code that
won't be accepted eventually?

> Simply, if an end user knows how to override a "gatekeeper" that user
> can test features that we are otherwise still debating upstream.  They
> can of course also apply the patches directly, but I am proposing we
> formalize a mechanism to encourage more experimentation in-tree.

So is staging not sufficient any more?

> I'm fully aware we do not have the tactical data nor operational
> control to run the kernel like a website, that's not my concern.  My
> concern is with expanding a maintainer's options for mitigating risk.

What risk exactly do you mean?

Rafael

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Ksummit-discuss] [CORE TOPIC] [nomination] Move Fast and Oops Things
  2014-05-21 23:03                 ` Dan Williams
  2014-05-21 23:40                   ` Laurent Pinchart
  2014-05-22  0:10                   ` Rafael J. Wysocki
@ 2014-05-22 15:48                   ` Theodore Ts'o
  2014-05-22 16:31                     ` Dan Williams
  2 siblings, 1 reply; 38+ messages in thread
From: Theodore Ts'o @ 2014-05-22 15:48 UTC (permalink / raw)
  To: Dan Williams; +Cc: ksummit-discuss

On Wed, May 21, 2014 at 04:03:49PM -0700, Dan Williams wrote:
> Simply, if an end user knows how to override a "gatekeeper" that user
> can test features that we are otherwise still debating upstream.  They
> can of course also apply the patches directly, but I am proposing we
> formalize a mechanism to encourage more experimentation in-tree.
> 
> I'm fully aware we do not have the tactical data nor operational
> control to run the kernel like a website, that's not my concern.  My
> concern is with expanding a maintainer's options for mitigating risk.

Various maintainers are doing this sort of thing already.  For
example, file system developers stage new file system features in
precisely this way.  Both xfs and ext4 have done this sort of thing,
and certainly SuSE has used this technique with btrfs to only support
those file system features which they are prepared to support.

The problem is using this sort of gatekeeper is something that a
maintainer has to use in combination with existing techniques, and it
doesn't necessarliy accelerate development by all that much.  In
particular, if it has any kind of kernel ABI or file system format
implications, we need to make sure the interfaces are set in stone
before we can let it into the mainline kernel, even if it is not
enabled by default.  (Consider the avidity that userspace application
developers can sometimes have for using even debugging interfaces such
as ftrace, and the "no userspace breakages" rule.  So not only do you
have to worry about userspace applicaitons not using a feature which
is protected by a gatekeeper, you also have to worry about premature
pervasive use of a feature such that you can't change the interface
any more.)

That by the way is the singular huge advangtage that centralized code
bases such as those found at Google and Facebook have --- if I need to
make a kernel change for some feature that hasn't made it upstream
yet, all of the users of some particular Google-specific kernel<->user
space interface is under a single source tree, and while I do need to
worry about staged deployments, I can be extremely confident that I
can identify all of the users of a particular interface, and put in
appropriate measures to update an interface.  It still might take
several release candences, but that's typically far shorter than what
it would take to obsolete a published upstream interface.

As a result, I am much more willing to let a ugly, but operationally
necessary new feature (such as say a netlink interface to export
information about file system errors, for example) into an internal
Google kernel interface, but I'd be much less willing to let something
like that go upstream, because while it's annoying to have to forward
port such an out-of-tree patch, having to deal with fixing or
upgrading a published interface is at least an order or two more work.

In addition, both Google and Facebook can afford to make changes that
only need to worry about their data center environment, where as an
upstream change has to work in a much larger variety of situations and
circumstances.

The bottom line is just because you can do something at Facebook or
Google does not necessarily mean that the same technique will port
over easily into the upstream development model.

						- Ted

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Ksummit-discuss] [CORE TOPIC] [nomination] Move Fast and Oops Things
  2014-05-22 15:48                   ` Theodore Ts'o
@ 2014-05-22 16:31                     ` Dan Williams
  2014-05-22 17:38                       ` Theodore Ts'o
                                         ` (3 more replies)
  0 siblings, 4 replies; 38+ messages in thread
From: Dan Williams @ 2014-05-22 16:31 UTC (permalink / raw)
  To: Theodore Ts'o; +Cc: ksummit-discuss

On Thu, May 22, 2014 at 8:48 AM, Theodore Ts'o <tytso@mit.edu> wrote:
> On Wed, May 21, 2014 at 04:03:49PM -0700, Dan Williams wrote:
>> Simply, if an end user knows how to override a "gatekeeper" that user
>> can test features that we are otherwise still debating upstream.  They
>> can of course also apply the patches directly, but I am proposing we
>> formalize a mechanism to encourage more experimentation in-tree.
>>
>> I'm fully aware we do not have the tactical data nor operational
>> control to run the kernel like a website, that's not my concern.  My
>> concern is with expanding a maintainer's options for mitigating risk.
>
> Various maintainers are doing this sort of thing already.  For
> example, file system developers stage new file system features in
> precisely this way.  Both xfs and ext4 have done this sort of thing,
> and certainly SuSE has used this technique with btrfs to only support
> those file system features which they are prepared to support.
>
> The problem is using this sort of gatekeeper is something that a
> maintainer has to use in combination with existing techniques, and it
> doesn't necessarliy accelerate development by all that much.  In
> particular, if it has any kind of kernel ABI or file system format
> implications, we need to make sure the interfaces are set in stone
> before we can let it into the mainline kernel, even if it is not
> enabled by default.  (Consider the avidity that userspace application
> developers can sometimes have for using even debugging interfaces such
> as ftrace, and the "no userspace breakages" rule.  So not only do you
> have to worry about userspace applicaitons not using a feature which
> is protected by a gatekeeper, you also have to worry about premature
> pervasive use of a feature such that you can't change the interface
> any more.)

I agree that something like this is prickly once it gets entangled
with ABI concerns.  But, I disagree with the speed argument... unless
you believe -staging has not increased the velocity of kernel
development?

> That by the way is the singular huge advangtage that centralized code
> bases such as those found at Google and Facebook have --- if I need to
> make a kernel change for some feature that hasn't made it upstream
> yet, all of the users of some particular Google-specific kernel<->user
> space interface is under a single source tree, and while I do need to
> worry about staged deployments, I can be extremely confident that I
> can identify all of the users of a particular interface, and put in
> appropriate measures to update an interface.  It still might take
> several release candences, but that's typically far shorter than what
> it would take to obsolete a published upstream interface.

Understood, but I'm not advocating that a system like this be used to
support the Facebook/Google style kernel hacks to do things that only
mega-datacenters care about.

> As a result, I am much more willing to let a ugly, but operationally
> necessary new feature (such as say a netlink interface to export
> information about file system errors, for example) into an internal
> Google kernel interface, but I'd be much less willing to let something
> like that go upstream, because while it's annoying to have to forward
> port such an out-of-tree patch, having to deal with fixing or
> upgrading a published interface is at least an order or two more work.
>
> In addition, both Google and Facebook can afford to make changes that
> only need to worry about their data center environment, where as an
> upstream change has to work in a much larger variety of situations and
> circumstances.
>
> The bottom line is just because you can do something at Facebook or
> Google does not necessarily mean that the same technique will port
> over easily into the upstream development model.

Neil already disabused me of the idea that a "gatekeeper" could be
used to beneficial effect in the core kernel, and I can see it's
equally difficult to use this in filesystems that need to be careful
of ABI changes.  However, nothing presented so far has swayed me from
my top of mind concern which is the ability to ship pre-production
driver features in the upstream kernel. I'm thinking of it as
"-staging for otherwise established drivers".

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Ksummit-discuss] [CORE TOPIC] [nomination] Move Fast and Oops Things
  2014-05-22 16:31                     ` Dan Williams
@ 2014-05-22 17:38                       ` Theodore Ts'o
  2014-05-22 18:42                       ` Dan Williams
                                         ` (2 subsequent siblings)
  3 siblings, 0 replies; 38+ messages in thread
From: Theodore Ts'o @ 2014-05-22 17:38 UTC (permalink / raw)
  To: Dan Williams; +Cc: ksummit-discuss

On Thu, May 22, 2014 at 09:31:44AM -0700, Dan Williams wrote:
> Neil already disabused me of the idea that a "gatekeeper" could be
> used to beneficial effect in the core kernel, and I can see it's
> equally difficult to use this in filesystems that need to be careful
> of ABI changes.  However, nothing presented so far has swayed me from
> my top of mind concern which is the ability to ship pre-production
> driver features in the upstream kernel. I'm thinking of it as
> "-staging for otherwise established drivers".

In the case where you are just adding some additional hardware
enablement for some newer version of some chipset, I can see the
applicability.  But if the new feature also requires new core code
functionality (for example some smarter way of handling interrupt
mitigation or interrupt steering, for example), the "gatekeeper"
approach can also get problematic, for the reasons Neil outlined.

For example, I can remember lots of serial driver enhancements that
required core tty layer changes in order to be effective.  (In fact I
had a friendly competition with the FreeBSD tty maintainer many years
ago, but one of the reasons why I was able to get significantly better
improvements with Linux was because the FreeBSD core team back then
viewed the architecture from BSD 4.3 to be handed down from the
mountain top as if from Moses....)

So this is why I'm wondering how commonly applicable this particular
technique might be, and if it's restricted to individual driver code,
is there any thing special we really need to do to encourage this.
After all, device drivers authors could use a sysfs file to do this
sort of thing today, right?

					- Ted

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Ksummit-discuss] [CORE TOPIC] [nomination] Move Fast and Oops Things
  2014-05-22 16:31                     ` Dan Williams
  2014-05-22 17:38                       ` Theodore Ts'o
@ 2014-05-22 18:42                       ` Dan Williams
  2014-05-22 19:06                         ` Chris Mason
  2014-05-22 20:31                       ` Dan Carpenter
  2014-05-23  2:13                       ` Greg KH
  3 siblings, 1 reply; 38+ messages in thread
From: Dan Williams @ 2014-05-22 18:42 UTC (permalink / raw)
  To: Theodore Ts'o; +Cc: ksummit-discuss

On Thu, May 22, 2014 at 9:31 AM, Dan Williams <dan.j.williams@intel.com> wrote:
> On Thu, May 22, 2014 at 8:48 AM, Theodore Ts'o <tytso@mit.edu> wrote:
>> On Wed, May 21, 2014 at 04:03:49PM -0700, Dan Williams wrote:
>>> Simply, if an end user knows how to override a "gatekeeper" that user
>>> can test features that we are otherwise still debating upstream.  They
>>> can of course also apply the patches directly, but I am proposing we
>>> formalize a mechanism to encourage more experimentation in-tree.
>>>
>>> I'm fully aware we do not have the tactical data nor operational
>>> control to run the kernel like a website, that's not my concern.  My
>>> concern is with expanding a maintainer's options for mitigating risk.
>>
>> Various maintainers are doing this sort of thing already.  For
>> example, file system developers stage new file system features in
>> precisely this way.  Both xfs and ext4 have done this sort of thing,
>> and certainly SuSE has used this technique with btrfs to only support
>> those file system features which they are prepared to support.
>>
>> The problem is using this sort of gatekeeper is something that a
>> maintainer has to use in combination with existing techniques, and it
>> doesn't necessarliy accelerate development by all that much.  In
>> particular, if it has any kind of kernel ABI or file system format
>> implications, we need to make sure the interfaces are set in stone
>> before we can let it into the mainline kernel, even if it is not
>> enabled by default.  (Consider the avidity that userspace application
>> developers can sometimes have for using even debugging interfaces such
>> as ftrace, and the "no userspace breakages" rule.  So not only do you
>> have to worry about userspace applicaitons not using a feature which
>> is protected by a gatekeeper, you also have to worry about premature
>> pervasive use of a feature such that you can't change the interface
>> any more.)
>
> I agree that something like this is prickly once it gets entangled
> with ABI concerns.  But, I disagree with the speed argument... unless
> you believe -staging has not increased the velocity of kernel
> development?
>
>> That by the way is the singular huge advangtage that centralized code
>> bases such as those found at Google and Facebook have --- if I need to
>> make a kernel change for some feature that hasn't made it upstream
>> yet, all of the users of some particular Google-specific kernel<->user
>> space interface is under a single source tree, and while I do need to
>> worry about staged deployments, I can be extremely confident that I
>> can identify all of the users of a particular interface, and put in
>> appropriate measures to update an interface.  It still might take
>> several release candences, but that's typically far shorter than what
>> it would take to obsolete a published upstream interface.
>
> Understood, but I'm not advocating that a system like this be used to
> support the Facebook/Google style kernel hacks to do things that only
> mega-datacenters care about.
>
>> As a result, I am much more willing to let a ugly, but operationally
>> necessary new feature (such as say a netlink interface to export
>> information about file system errors, for example) into an internal
>> Google kernel interface, but I'd be much less willing to let something
>> like that go upstream, because while it's annoying to have to forward
>> port such an out-of-tree patch, having to deal with fixing or
>> upgrading a published interface is at least an order or two more work.
>>
>> In addition, both Google and Facebook can afford to make changes that
>> only need to worry about their data center environment, where as an
>> upstream change has to work in a much larger variety of situations and
>> circumstances.
>>
>> The bottom line is just because you can do something at Facebook or
>> Google does not necessarily mean that the same technique will port
>> over easily into the upstream development model.
>
> Neil already disabused me of the idea that a "gatekeeper" could be
> used to beneficial effect in the core kernel, and I can see it's
> equally difficult to use this in filesystems that need to be careful
> of ABI changes.  However, nothing presented so far has swayed me from
> my top of mind concern which is the ability to ship pre-production
> driver features in the upstream kernel. I'm thinking of it as
> "-staging for otherwise established drivers".

Interesting quote / counterpoint from Dave Chinner that supports the
"don't do this for filesystems!" sentiment:

"The development of btrfs has shown that moving prototype filesystems
into the main kernel tree does not lead stability, performance or
production readiness any faster than if they stayed as an out-of-tree
module until most of the development was complete. If anything,
merging into mainline reduces the speed at which a filesystem can be
brought to being feature complete and production ready."

The care that must be taken with merging experiments is accidentally
leaking promises that you don't intend to keep to users.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Ksummit-discuss] [CORE TOPIC] [nomination] Move Fast and Oops Things
  2014-05-22 18:42                       ` Dan Williams
@ 2014-05-22 19:06                         ` Chris Mason
  0 siblings, 0 replies; 38+ messages in thread
From: Chris Mason @ 2014-05-22 19:06 UTC (permalink / raw)
  To: Dan Williams, Theodore Ts'o; +Cc: ksummit-discuss

On 05/22/2014 02:42 PM, Dan Williams wrote:
> On Thu, May 22, 2014 at 9:31 AM, Dan Williams <dan.j.williams@intel.com> wrote:
> 
> Interesting quote / counterpoint from Dave Chinner that supports the
> "don't do this for filesystems!" sentiment:
> 
> "The development of btrfs has shown that moving prototype filesystems
> into the main kernel tree does not lead stability, performance or
> production readiness any faster than if they stayed as an out-of-tree
> module until most of the development was complete. If anything,
> merging into mainline reduces the speed at which a filesystem can be
> brought to being feature complete and production ready."
> 
> The care that must be taken with merging experiments is accidentally
> leaking promises that you don't intend to keep to users.

Not too surprising, but I disagree with Dave here.  Having things
upstream earlier increases community ownership, and it helps reduce
silos of private code in the project.

Btrfs does have its warts, but it also looks like a Linux filesystem.
Out of tree, it would be something different, and certainly less than it
is now.

-chris

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Ksummit-discuss] [CORE TOPIC] [nomination] Move Fast and Oops Things
  2014-05-22 16:31                     ` Dan Williams
  2014-05-22 17:38                       ` Theodore Ts'o
  2014-05-22 18:42                       ` Dan Williams
@ 2014-05-22 20:31                       ` Dan Carpenter
  2014-05-22 20:56                         ` Geert Uytterhoeven
  2014-05-23  2:13                       ` Greg KH
  3 siblings, 1 reply; 38+ messages in thread
From: Dan Carpenter @ 2014-05-22 20:31 UTC (permalink / raw)
  To: Dan Williams; +Cc: ksummit-discuss

On Thu, May 22, 2014 at 09:31:44AM -0700, Dan Williams wrote:
> I agree that something like this is prickly once it gets entangled
> with ABI concerns.  But, I disagree with the speed argument... unless
> you believe -staging has not increased the velocity of kernel
> development?

Staging is good because it brings more developers, but in many cases it
is a slow down.  Merged codes has stricter rules where you have to write
reviewable patches.  If there is a bug early in a patch series then you
can't just fix it in a later patch, you need to redo the whole series.
Porting a wifi driver to a different wireless stack is
difficult/impossible when you have to write bisectable code.

I often think that developers would be better off just working like mad
to fix things up outside the tree.

The good thing about staging is that before there were all these drivers
out there which people were using but they were never going to be merged
in the kernel.  Now we merge them and try to clean them up so it is a
step in the right direction.

regards,
dan carpenter

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Ksummit-discuss] [CORE TOPIC] [nomination] Move Fast and Oops Things
  2014-05-22 20:31                       ` Dan Carpenter
@ 2014-05-22 20:56                         ` Geert Uytterhoeven
  2014-05-23  6:21                           ` James Bottomley
  0 siblings, 1 reply; 38+ messages in thread
From: Geert Uytterhoeven @ 2014-05-22 20:56 UTC (permalink / raw)
  To: Dan Carpenter; +Cc: ksummit-discuss

On Thu, May 22, 2014 at 10:31 PM, Dan Carpenter
<dan.carpenter@oracle.com> wrote:
> On Thu, May 22, 2014 at 09:31:44AM -0700, Dan Williams wrote:
>> I agree that something like this is prickly once it gets entangled
>> with ABI concerns.  But, I disagree with the speed argument... unless
>> you believe -staging has not increased the velocity of kernel
>> development?
>
> Staging is good because it brings more developers, but in many cases it
> is a slow down.  Merged codes has stricter rules where you have to write
> reviewable patches.  If there is a bug early in a patch series then you
> can't just fix it in a later patch, you need to redo the whole series.

In theory...

These days many fixes end up as separate commits in various subsystem
trees, due to "no rebase" rules and other regulations.

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Ksummit-discuss] [CORE TOPIC] [nomination] Move Fast and Oops Things
  2014-05-22 20:56                         ` Geert Uytterhoeven
@ 2014-05-23  6:21                           ` James Bottomley
  2014-05-23 14:11                             ` John W. Linville
  0 siblings, 1 reply; 38+ messages in thread
From: James Bottomley @ 2014-05-23  6:21 UTC (permalink / raw)
  To: Geert Uytterhoeven; +Cc: ksummit-discuss, Dan Carpenter

On Thu, 2014-05-22 at 22:56 +0200, Geert Uytterhoeven wrote:
> On Thu, May 22, 2014 at 10:31 PM, Dan Carpenter
> <dan.carpenter@oracle.com> wrote:
> > On Thu, May 22, 2014 at 09:31:44AM -0700, Dan Williams wrote:
> >> I agree that something like this is prickly once it gets entangled
> >> with ABI concerns.  But, I disagree with the speed argument... unless
> >> you believe -staging has not increased the velocity of kernel
> >> development?
> >
> > Staging is good because it brings more developers, but in many cases it
> > is a slow down.  Merged codes has stricter rules where you have to write
> > reviewable patches.  If there is a bug early in a patch series then you
> > can't just fix it in a later patch, you need to redo the whole series.
> 
> In theory...
> 
> These days many fixes end up as separate commits in various subsystem
> trees, due to "no rebase" rules and other regulations.

No, pretty much in practise.  I've no qualms about dropping a patch
series if one of the git tree tests shows problems and, since I have a
mostly linear tree, that means a rebase.

  I also don't believe in "preserving" history which is simply bug fixes
that should have been in the series.  Sometimes, if the fix took a while
to track down, I might keep the separate patch for credit + learning,
but most of the time I'd fold it into a commit and annotate the commit.

James

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Ksummit-discuss] [CORE TOPIC] [nomination] Move Fast and Oops Things
  2014-05-23  6:21                           ` James Bottomley
@ 2014-05-23 14:11                             ` John W. Linville
  2014-05-24  9:14                               ` James Bottomley
  0 siblings, 1 reply; 38+ messages in thread
From: John W. Linville @ 2014-05-23 14:11 UTC (permalink / raw)
  To: James Bottomley; +Cc: Dan Carpenter, ksummit-discuss

On Thu, May 22, 2014 at 11:21:35PM -0700, James Bottomley wrote:
> On Thu, 2014-05-22 at 22:56 +0200, Geert Uytterhoeven wrote:
> > On Thu, May 22, 2014 at 10:31 PM, Dan Carpenter
> > <dan.carpenter@oracle.com> wrote:
> > > On Thu, May 22, 2014 at 09:31:44AM -0700, Dan Williams wrote:
> > >> I agree that something like this is prickly once it gets entangled
> > >> with ABI concerns.  But, I disagree with the speed argument... unless
> > >> you believe -staging has not increased the velocity of kernel
> > >> development?
> > >
> > > Staging is good because it brings more developers, but in many cases it
> > > is a slow down.  Merged codes has stricter rules where you have to write
> > > reviewable patches.  If there is a bug early in a patch series then you
> > > can't just fix it in a later patch, you need to redo the whole series.
> > 
> > In theory...
> > 
> > These days many fixes end up as separate commits in various subsystem
> > trees, due to "no rebase" rules and other regulations.
> 
> No, pretty much in practise.  I've no qualms about dropping a patch
> series if one of the git tree tests shows problems and, since I have a
> mostly linear tree, that means a rebase.
> 
>   I also don't believe in "preserving" history which is simply bug fixes
> that should have been in the series.  Sometimes, if the fix took a while
> to track down, I might keep the separate patch for credit + learning,
> but most of the time I'd fold it into a commit and annotate the commit.

That's all well and good, but rebasing causes a lot of pain.  This is
particularly true when you have downstream trees.

In any case, bugs will eventually show-up -- probably on the day after
you merge the 'final' series.  Hopefully those are not 'brown paper bag'
bugs, but you can only stall a series so long in hopes of shaking
those out.  You can only extend yourself so far in pursuit of bisectability.

John
-- 
John W. Linville		Someday the world will need a hero, and you
linville@tuxdriver.com			might be all we have.  Be ready.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Ksummit-discuss] [CORE TOPIC] [nomination] Move Fast and Oops Things
  2014-05-23 14:11                             ` John W. Linville
@ 2014-05-24  9:14                               ` James Bottomley
  2014-05-24 19:19                                 ` Geert Uytterhoeven
  0 siblings, 1 reply; 38+ messages in thread
From: James Bottomley @ 2014-05-24  9:14 UTC (permalink / raw)
  To: John W. Linville; +Cc: ksummit-discuss, Dan Carpenter

On Fri, 2014-05-23 at 10:11 -0400, John W. Linville wrote:
> On Thu, May 22, 2014 at 11:21:35PM -0700, James Bottomley wrote:
> > On Thu, 2014-05-22 at 22:56 +0200, Geert Uytterhoeven wrote:
> > > On Thu, May 22, 2014 at 10:31 PM, Dan Carpenter
> > > <dan.carpenter@oracle.com> wrote:
> > > > On Thu, May 22, 2014 at 09:31:44AM -0700, Dan Williams wrote:
> > > >> I agree that something like this is prickly once it gets entangled
> > > >> with ABI concerns.  But, I disagree with the speed argument... unless
> > > >> you believe -staging has not increased the velocity of kernel
> > > >> development?
> > > >
> > > > Staging is good because it brings more developers, but in many cases it
> > > > is a slow down.  Merged codes has stricter rules where you have to write
> > > > reviewable patches.  If there is a bug early in a patch series then you
> > > > can't just fix it in a later patch, you need to redo the whole series.
> > > 
> > > In theory...
> > > 
> > > These days many fixes end up as separate commits in various subsystem
> > > trees, due to "no rebase" rules and other regulations.
> > 
> > No, pretty much in practise.  I've no qualms about dropping a patch
> > series if one of the git tree tests shows problems and, since I have a
> > mostly linear tree, that means a rebase.
> > 
> >   I also don't believe in "preserving" history which is simply bug fixes
> > that should have been in the series.  Sometimes, if the fix took a while
> > to track down, I might keep the separate patch for credit + learning,
> > but most of the time I'd fold it into a commit and annotate the commit.
> 
> That's all well and good, but rebasing causes a lot of pain.

Not usually if you manage it right.

>   This is particularly true when you have downstream trees.

What I find is that people rarely actually need to base development on
my tree as upstream.  We do sometimes get the odd entangled patch (code
that changes something that changed in my tree), but we haven't had that
for a while now.  The rule therefore is use an upstream Linus tree to
develop unless you specifically have entangled patches.  If you need to
test with my tree, you can still pull it in as a merge.

I also have specific methodologies where I keep head and tail branches
of my trees, so for <x> development branch I have an <x>-base branch as
well, so I can simply do a

git checkout <x>
git rebase --onto origin/master <x>-base
git branch -f <x>-base origin/master

> In any case, bugs will eventually show-up -- probably on the day after
> you merge the 'final' series.  Hopefully those are not 'brown paper bag'
> bugs, but you can only stall a series so long in hopes of shaking
> those out.  You can only extend yourself so far in pursuit of bisectability.

Right, you have to have a "history commit" point ... for me that's when
I send the tree to Linus ... then the history becomes immutable and any
breakage discovered afterwards has to be fixed by separate patches.

James

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Ksummit-discuss] [CORE TOPIC] [nomination] Move Fast and Oops Things
  2014-05-24  9:14                               ` James Bottomley
@ 2014-05-24 19:19                                 ` Geert Uytterhoeven
  0 siblings, 0 replies; 38+ messages in thread
From: Geert Uytterhoeven @ 2014-05-24 19:19 UTC (permalink / raw)
  To: James Bottomley; +Cc: Dan Carpenter, ksummit-discuss

Hi James,

On Sat, May 24, 2014 at 11:14 AM, James Bottomley
<James.Bottomley@hansenpartnership.com> wrote:
> I also have specific methodologies where I keep head and tail branches
> of my trees, so for <x> development branch I have an <x>-base branch as
> well, so I can simply do a
>
> git checkout <x>
> git rebase --onto origin/master <x>-base
> git branch -f <x>-base origin/master

If your origin/master is only forwarding (i.e. never rebased), you can do
without the <x>-base branch, as it will always point somewhere into the
history of origin/master.
Git is smart enough so "git rebase origin/master <x>" will do the right thing.

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Ksummit-discuss] [CORE TOPIC] [nomination] Move Fast and Oops Things
  2014-05-22 16:31                     ` Dan Williams
                                         ` (2 preceding siblings ...)
  2014-05-22 20:31                       ` Dan Carpenter
@ 2014-05-23  2:13                       ` Greg KH
  2014-05-23  3:03                         ` Dan Williams
  2014-05-23 14:02                         ` Josh Boyer
  3 siblings, 2 replies; 38+ messages in thread
From: Greg KH @ 2014-05-23  2:13 UTC (permalink / raw)
  To: Dan Williams; +Cc: ksummit-discuss

On Thu, May 22, 2014 at 09:31:44AM -0700, Dan Williams wrote:
> On Thu, May 22, 2014 at 8:48 AM, Theodore Ts'o <tytso@mit.edu> wrote:
> > On Wed, May 21, 2014 at 04:03:49PM -0700, Dan Williams wrote:
> >> Simply, if an end user knows how to override a "gatekeeper" that user
> >> can test features that we are otherwise still debating upstream.  They
> >> can of course also apply the patches directly, but I am proposing we
> >> formalize a mechanism to encourage more experimentation in-tree.
> >>
> >> I'm fully aware we do not have the tactical data nor operational
> >> control to run the kernel like a website, that's not my concern.  My
> >> concern is with expanding a maintainer's options for mitigating risk.
> >
> > Various maintainers are doing this sort of thing already.  For
> > example, file system developers stage new file system features in
> > precisely this way.  Both xfs and ext4 have done this sort of thing,
> > and certainly SuSE has used this technique with btrfs to only support
> > those file system features which they are prepared to support.
> >
> > The problem is using this sort of gatekeeper is something that a
> > maintainer has to use in combination with existing techniques, and it
> > doesn't necessarliy accelerate development by all that much.  In
> > particular, if it has any kind of kernel ABI or file system format
> > implications, we need to make sure the interfaces are set in stone
> > before we can let it into the mainline kernel, even if it is not
> > enabled by default.  (Consider the avidity that userspace application
> > developers can sometimes have for using even debugging interfaces such
> > as ftrace, and the "no userspace breakages" rule.  So not only do you
> > have to worry about userspace applicaitons not using a feature which
> > is protected by a gatekeeper, you also have to worry about premature
> > pervasive use of a feature such that you can't change the interface
> > any more.)
> 
> I agree that something like this is prickly once it gets entangled
> with ABI concerns.  But, I disagree with the speed argument... unless
> you believe -staging has not increased the velocity of kernel
> development?

As the maintainer of drivers/staging/ I don't think it has increased the
speed of the development of other parts of the kernel at all.  Do you
have numbers that show otherwise?

> Neil already disabused me of the idea that a "gatekeeper" could be
> used to beneficial effect in the core kernel, and I can see it's
> equally difficult to use this in filesystems that need to be careful
> of ABI changes.  However, nothing presented so far has swayed me from
> my top of mind concern which is the ability to ship pre-production
> driver features in the upstream kernel. I'm thinking of it as
> "-staging for otherwise established drivers".

The thing you need to realize is that the large majority of people who
would ever use that new "feature" will not until it ends up in an
"enterprise" kernel release.  And that will not be for another few
years, so while you think you got it all right, we really don't know who
is using it, or how well it works, for a few years.

But feel free to try to do this in your subsystem, as Ted points out, it
can be done for somethings, but be careful about thinking things are ok
when you don't have many real users :)

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Ksummit-discuss] [CORE TOPIC] [nomination] Move Fast and Oops Things
  2014-05-23  2:13                       ` Greg KH
@ 2014-05-23  3:03                         ` Dan Williams
  2014-05-23  7:44                           ` Greg KH
  2014-05-23 14:02                         ` Josh Boyer
  1 sibling, 1 reply; 38+ messages in thread
From: Dan Williams @ 2014-05-23  3:03 UTC (permalink / raw)
  To: Greg KH; +Cc: ksummit-discuss

On Thu, May 22, 2014 at 7:13 PM, Greg KH <greg@kroah.com> wrote:
> On Thu, May 22, 2014 at 09:31:44AM -0700, Dan Williams wrote:
>> On Thu, May 22, 2014 at 8:48 AM, Theodore Ts'o <tytso@mit.edu> wrote:
>> > On Wed, May 21, 2014 at 04:03:49PM -0700, Dan Williams wrote:
>> >> Simply, if an end user knows how to override a "gatekeeper" that user
>> >> can test features that we are otherwise still debating upstream.  They
>> >> can of course also apply the patches directly, but I am proposing we
>> >> formalize a mechanism to encourage more experimentation in-tree.
>> >>
>> >> I'm fully aware we do not have the tactical data nor operational
>> >> control to run the kernel like a website, that's not my concern.  My
>> >> concern is with expanding a maintainer's options for mitigating risk.
>> >
>> > Various maintainers are doing this sort of thing already.  For
>> > example, file system developers stage new file system features in
>> > precisely this way.  Both xfs and ext4 have done this sort of thing,
>> > and certainly SuSE has used this technique with btrfs to only support
>> > those file system features which they are prepared to support.
>> >
>> > The problem is using this sort of gatekeeper is something that a
>> > maintainer has to use in combination with existing techniques, and it
>> > doesn't necessarliy accelerate development by all that much.  In
>> > particular, if it has any kind of kernel ABI or file system format
>> > implications, we need to make sure the interfaces are set in stone
>> > before we can let it into the mainline kernel, even if it is not
>> > enabled by default.  (Consider the avidity that userspace application
>> > developers can sometimes have for using even debugging interfaces such
>> > as ftrace, and the "no userspace breakages" rule.  So not only do you
>> > have to worry about userspace applicaitons not using a feature which
>> > is protected by a gatekeeper, you also have to worry about premature
>> > pervasive use of a feature such that you can't change the interface
>> > any more.)
>>
>> I agree that something like this is prickly once it gets entangled
>> with ABI concerns.  But, I disagree with the speed argument... unless
>> you believe -staging has not increased the velocity of kernel
>> development?
>
> As the maintainer of drivers/staging/ I don't think it has increased the
> speed of the development of other parts of the kernel at all.  Do you
> have numbers that show otherwise?

Well, I'm defining velocity as value delivered to end users and amount
of testing that can be dsitributed by being upstream.  By that
definition -staging does make us faster simply because mainline
releases have more drivers that they would otherwise, and it attracts
more developers to test and cleanup the code.

>> Neil already disabused me of the idea that a "gatekeeper" could be
>> used to beneficial effect in the core kernel, and I can see it's
>> equally difficult to use this in filesystems that need to be careful
>> of ABI changes.  However, nothing presented so far has swayed me from
>> my top of mind concern which is the ability to ship pre-production
>> driver features in the upstream kernel. I'm thinking of it as
>> "-staging for otherwise established drivers".
>
> The thing you need to realize is that the large majority of people who
> would ever use that new "feature" will not until it ends up in an
> "enterprise" kernel release.  And that will not be for another few
> years, so while you think you got it all right, we really don't know who
> is using it, or how well it works, for a few years.
>
> But feel free to try to do this in your subsystem, as Ted points out, it
> can be done for somethings, but be careful about thinking things are ok
> when you don't have many real users :)
>

Point taken.

However, if this is the case, why is there so much tension around some
merge events?  Especially in cases where there is low risk for
regression.  We seem to aim for perfection in merging and that is
specifically the latency I am targeting with a "this feature is behind
a gatekeeper" release-valve for that pressure to not merge.  If things
stay behind a gatekeeper too long they get reverted.  Would that
modulate the latency to "ack" in any meaningful way.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Ksummit-discuss] [CORE TOPIC] [nomination] Move Fast and Oops Things
  2014-05-23  3:03                         ` Dan Williams
@ 2014-05-23  7:44                           ` Greg KH
  0 siblings, 0 replies; 38+ messages in thread
From: Greg KH @ 2014-05-23  7:44 UTC (permalink / raw)
  To: Dan Williams; +Cc: ksummit-discuss

On Thu, May 22, 2014 at 08:03:32PM -0700, Dan Williams wrote:
> >> Neil already disabused me of the idea that a "gatekeeper" could be
> >> used to beneficial effect in the core kernel, and I can see it's
> >> equally difficult to use this in filesystems that need to be careful
> >> of ABI changes.  However, nothing presented so far has swayed me from
> >> my top of mind concern which is the ability to ship pre-production
> >> driver features in the upstream kernel. I'm thinking of it as
> >> "-staging for otherwise established drivers".
> >
> > The thing you need to realize is that the large majority of people who
> > would ever use that new "feature" will not until it ends up in an
> > "enterprise" kernel release.  And that will not be for another few
> > years, so while you think you got it all right, we really don't know who
> > is using it, or how well it works, for a few years.
> >
> > But feel free to try to do this in your subsystem, as Ted points out, it
> > can be done for somethings, but be careful about thinking things are ok
> > when you don't have many real users :)
> >
> 
> Point taken.
> 
> However, if this is the case, why is there so much tension around some
> merge events?  Especially in cases where there is low risk for
> regression.

What "tension" are you speaking of?  Getting new apis correct before we
do a release?  Or something else?

I didn't see any specific examples mentioned in this thread, but I might
have missed it.

> We seem to aim for perfection in merging and that is
> specifically the latency I am targeting with a "this feature is behind
> a gatekeeper" release-valve for that pressure to not merge.  If things
> stay behind a gatekeeper too long they get reverted.  Would that
> modulate the latency to "ack" in any meaningful way.

For a filesystem, or a driver, as stated, this might work.  For a
syscall, or new subsystem api to userspace, that isn't going to work for
the above mentioned reasons.  See the cgroups interface for one example
of how long it took for people to actually start to use it (years) and
then, once we realized just how bad the interface really was for
real-world usages, it was too late as people were already using them, so
we have to have them around for an indefinate time before they can be
removed, if ever.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Ksummit-discuss] [CORE TOPIC] [nomination] Move Fast and Oops Things
  2014-05-23  2:13                       ` Greg KH
  2014-05-23  3:03                         ` Dan Williams
@ 2014-05-23 14:02                         ` Josh Boyer
  1 sibling, 0 replies; 38+ messages in thread
From: Josh Boyer @ 2014-05-23 14:02 UTC (permalink / raw)
  To: Greg KH; +Cc: ksummit-discuss

On Thu, May 22, 2014 at 10:13 PM, Greg KH <greg@kroah.com> wrote:
> On Thu, May 22, 2014 at 09:31:44AM -0700, Dan Williams wrote:
>> On Thu, May 22, 2014 at 8:48 AM, Theodore Ts'o <tytso@mit.edu> wrote:
>> > On Wed, May 21, 2014 at 04:03:49PM -0700, Dan Williams wrote:
>> >> Simply, if an end user knows how to override a "gatekeeper" that user
>> >> can test features that we are otherwise still debating upstream.  They
>> >> can of course also apply the patches directly, but I am proposing we
>> >> formalize a mechanism to encourage more experimentation in-tree.
>> >>
>> >> I'm fully aware we do not have the tactical data nor operational
>> >> control to run the kernel like a website, that's not my concern.  My
>> >> concern is with expanding a maintainer's options for mitigating risk.
>> >
>> > Various maintainers are doing this sort of thing already.  For
>> > example, file system developers stage new file system features in
>> > precisely this way.  Both xfs and ext4 have done this sort of thing,
>> > and certainly SuSE has used this technique with btrfs to only support
>> > those file system features which they are prepared to support.
>> >
>> > The problem is using this sort of gatekeeper is something that a
>> > maintainer has to use in combination with existing techniques, and it
>> > doesn't necessarliy accelerate development by all that much.  In
>> > particular, if it has any kind of kernel ABI or file system format
>> > implications, we need to make sure the interfaces are set in stone
>> > before we can let it into the mainline kernel, even if it is not
>> > enabled by default.  (Consider the avidity that userspace application
>> > developers can sometimes have for using even debugging interfaces such
>> > as ftrace, and the "no userspace breakages" rule.  So not only do you
>> > have to worry about userspace applicaitons not using a feature which
>> > is protected by a gatekeeper, you also have to worry about premature
>> > pervasive use of a feature such that you can't change the interface
>> > any more.)
>>
>> I agree that something like this is prickly once it gets entangled
>> with ABI concerns.  But, I disagree with the speed argument... unless
>> you believe -staging has not increased the velocity of kernel
>> development?
>
> As the maintainer of drivers/staging/ I don't think it has increased the
> speed of the development of other parts of the kernel at all.  Do you
> have numbers that show otherwise?
>
>> Neil already disabused me of the idea that a "gatekeeper" could be
>> used to beneficial effect in the core kernel, and I can see it's
>> equally difficult to use this in filesystems that need to be careful
>> of ABI changes.  However, nothing presented so far has swayed me from
>> my top of mind concern which is the ability to ship pre-production
>> driver features in the upstream kernel. I'm thinking of it as
>> "-staging for otherwise established drivers".
>
> The thing you need to realize is that the large majority of people who
> would ever use that new "feature" will not until it ends up in an
> "enterprise" kernel release.  And that will not be for another few
> years, so while you think you got it all right, we really don't know who
> is using it, or how well it works, for a few years.

I don't entirely agree with that.  Many of the non-enterprise distros
are rebasing more frequently, and collectively their user bases are
pretty large.  Fedora, Arch, Ubuntu, and OpenSuSE get requests to
enable new features all the time.  If you consider the distros that
have an enterprise downstream (e.g. Fedora, OpenSuSE), you even get
people picking those up and using them as previews for the next EL
release.

So yes, EL kernels have massive userbases and they tend to adopt very
slowly.  However, as soon as code is in a released upstream kernel, a
non-trivial number of people are going to be able to use it.  If you
factor in hot-topic things like containers (docker docker docker),
those features are requested in the non-EL distros very rapidly
(sometimes even before they're merged).  Maybe Dan's case isn't
hot-topic enough to match this, but there is certainly the possibility
of early adoption and usage by a large number of users as soon as code
lands.

josh

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Ksummit-discuss] [CORE TOPIC] [nomination] Move Fast and Oops Things
  2014-05-21 15:35             ` Dan Williams
  2014-05-21 23:06               ` Rafael J. Wysocki
@ 2014-05-21 23:48               ` NeilBrown
  2014-05-22  4:04                 ` Dan Williams
  1 sibling, 1 reply; 38+ messages in thread
From: NeilBrown @ 2014-05-21 23:48 UTC (permalink / raw)
  To: Dan Williams; +Cc: ksummit-discuss

[-- Attachment #1: Type: text/plain, Size: 8015 bytes --]

On Wed, 21 May 2014 08:35:55 -0700 Dan Williams <dan.j.williams@intel.com>
wrote:

> On Wed, May 21, 2014 at 3:11 AM, NeilBrown <neilb@suse.de> wrote:
> > On Wed, 21 May 2014 01:36:55 -0700 Dan Williams <dan.j.williams@intel.com>
> > wrote:
> >
> >> On Wed, May 21, 2014 at 1:25 AM, NeilBrown <neilb@suse.de> wrote:
> >> > On Wed, 21 May 2014 00:48:48 -0700 Dan Williams <dan.j.williams@intel.com>
> >> > wrote:
> >> >
> >> >> On Fri, May 16, 2014 at 8:04 AM, Chris Mason <clm@fb.com> wrote:
> >> >> > -----BEGIN PGP SIGNED MESSAGE-----
> >> >> > Hash: SHA1
> >> >> >
> >> >> > On 05/15/2014 10:56 PM, NeilBrown wrote:
> >> >> >> On Thu, 15 May 2014 16:13:58 -0700 Dan Williams
> >> >> >> <dan.j.williams@gmail.com> wrote:
> >> >> >>
> >> >> >>> What would it take and would we even consider moving 2x faster
> >> >> >>> than we are now?
> >> >> >>
> >> >> >> Hi Dan, you seem to be suggesting that there is some limit other
> >> >> >> than "competent engineering time" which is slowing Linux "progress"
> >> >> >> down.
> >> >> >>
> >> >> >> Are you really suggesting that?  What might these other limits be?
> >> >> >>
> >> >> >> Certainly there are limits to minimum gap between conceptualisation
> >> >> >> and release (at least one release cycle), but is there really a
> >> >> >> limit to the parallelism that can be achieved?
> >> >> >
> >> >> > I haven't compared the FB commit rates with the kernel, but I'll
> >> >> > pretend Dan's basic thesis is right and talk about which parts of the
> >> >> > facebook model may move faster than the kernel.
> >> >> >
> >> >> > The facebook is pretty similar to the way the kernel works.  The merge
> >> >> > window lasts a few days and the major releases are every week, but
> >> >> > overall it isn't too far away.
> >> >> >
> >> >> > The biggest difference is that we have a centralized tool for
> >> >> > reviewing the patches, and once it has been reviewed by a specific
> >> >> > number of people, you push it in.
> >> >> >
> >> >> > The patch submission tool runs the patch through lint and various
> >> >> > static analysis to make sure it follows proper coding style and
> >> >> > doesn't include patterns of known bugs.  This cuts down on the review
> >> >> > work because the silly coding style mistakes are gone before it gets
> >> >> > to the tool.
> >> >> >
> >> >> > When you put in a patch, you have to put in reviewers, and they get a
> >> >> > little notification that your patch needs review.  Once the reviewers
> >> >> > are happy, you push the patch in.
> >> >> >
> >> >> > The biggest difference: there are no maintainers.  If I want to go
> >> >> > change the calendar tool to fix a bug, I patch it, get someone else to
> >> >> > sign off and push.
> >> >> >
> >> >> > All of which is my way of saying the maintainers (me included) are the
> >> >> > biggest bottleneck.  There are a lot of reasons I think the maintainer
> >> >> > model fits the kernel better, but at least for btrfs I'm trying to
> >> >> > speed up the patch review process and use patchwork more effectively.
> >> >>
> >> >> To be clear, I'm not arguing for a maintainer-less model.  We don't
> >> >> have the tooling or operational-data to support that.  We need
> >> >> maintainers to say "no".  But, what I think we can do is give
> >> >> maintainers more varied ways to say it.  The goal, de-escalate the
> >> >> merge event as a declaration that the code quality/architecture
> >> >> conversation is over.
> >> >>
> >> >> Release early, release often, and with care merge often.
> >> >
> >> > I think this falls foul of the "no regressions" rule.
> >> >
> >> > The kernel policy is that once the functionality gets to users, it cannot be
> >> > taken away.  Individual drivers in 'staging' manage to avoid this rule
> >> > because that are clearly separate things.
> >> > New system calls and attributes in sysfs etc seem to be much harder to
> >> > "partially" release.
> >>
> >> My straw man is something like the following for driver "foo"
> >>
> >> if (gatekeeper_foo_new_awesome_sauce)
> >>    do_new_thing();
> >>
> >> Where setting gatekeeper_foo_new_awesome_sauce taints the kernel and
> >> warns that there is no guarantee of this functionality being present
> >> in the same form or at all going forward.
> >
> > Interesting idea.
> > Trying to imagine how this might play out in practice....
> >
> > You talk about "value delivered to users".   But users tend to use
> > applications, and applications are the users of kernel features.
> >
> > Will anyone bother writing or adapting an application to use a feature which
> > is not guaranteed to hang around?
> > Maybe they will, but will the users of the application know that it might
> > stop working after a kernel upgrade?  Maybe...
> >
> > Maybe if we had some concrete examples of features that could have been
> > delayed using a gatekeeper.
> >
> > The one that springs to my mind is cgroups.  Clearly useful, but clearly
> > controversial.  It appears that the original implementation was seriously
> > flawed and Tejun is doing a massive amount of work to "fix" it, and this
> > apparently will lead to API changes.  And this is happening without any
> > gatekeepers.  Would it have been easier in some way with gatekeepers?
> > ... I don't see how it would be, except that fewer people would have used
> > cgroups, and then maybe we wouldn't have as much collective experience to
> > know what the real problems were(?).
> >
> > I think that is the key.  With a user-facing option, people will try it and
> > probably cope if it disappears (though they might complain loudly and sign
> > petitions declaring facebook to be the anti-$DEITY).  However  with kernel
> > internal options, applications are unlikely to use them without some
> > expectation of stability.  So finding the problems would be a lot harder.
> >
> > Which doesn't mean that it can't work, but it would be nice if create some
> > real life examples to see how it plays out in practice.
> >
> 
> Biased by my background of course, but I think driver development is
> more amenable to this sort of approach.  For drivers the kernel is in
> many instances the application.  For example, I currently have in my
> review queue a patch set to add sata port multiplier support to
> libsas.  I hope I get the review done in time for merging it in 3.16.
> But, what if I also had the option of saying "let's gatekeeper this
> for a cycle".  Users that care could start using it and reporting
> bugs, and it would be clear that the implementation is provisional.
> My opinion is that bug reports would attract deeper code review that
> otherwise would not occur if the feature was simply delayed for a
> cycle.

I can certainly see how this could work for driver features.  We sometimes do
that sort of incremental release with CONFIG options, but those are clumsy to
work with.  Having run-time enablement is appealing.

What might the control interface look like
I imagine something like dynamic_debug, which a file that lists all the
dynamic_config options.  Writing some message to the file would enable the
selected options, and so the dynamic code editing required to enable it.

I think it is probably worth trying - see what sort of take-up it gets.

NeilBrown


> 
> I think I also would have liked to use a gatekeeper to stage the
> deletion of NET_DMA from the kernel.  Mark it for removal, see who
> screams, but still make it straightforward for such people to make
> their case with data why the value should stay.
> 
> For the core kernel, which I admittedly have not touched much, are
> there cases where an application wants to make a value argument to
> users, but needs some kernel infrastructure to stand on?  Do we
> inadvertently stifle otherwise promising experiments by forcing
> upstream acceptance before the experiment gets the exposure it needs?


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Ksummit-discuss] [CORE TOPIC] [nomination] Move Fast and Oops Things
  2014-05-21 23:48               ` NeilBrown
@ 2014-05-22  4:04                 ` Dan Williams
  0 siblings, 0 replies; 38+ messages in thread
From: Dan Williams @ 2014-05-22  4:04 UTC (permalink / raw)
  To: NeilBrown; +Cc: ksummit-discuss

On Wed, May 21, 2014 at 4:48 PM, NeilBrown <neilb@suse.de> wrote:
[..]
> What might the control interface look like
> I imagine something like dynamic_debug, which a file that lists all the
> dynamic_config options.  Writing some message to the file would enable the
> selected options, and so the dynamic code editing required to enable it.

Ooh, yes, an interface similar to dynamic debug control seems like a good fit.

> I think it is probably worth trying - see what sort of take-up it gets.

Thanks Neil! ...as everyone else moans, "don't encourage him". ;-)

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Ksummit-discuss] [CORE TOPIC] [nomination] Move Fast and Oops Things
  2014-05-16  2:56 ` NeilBrown
  2014-05-16 15:04   ` Chris Mason
@ 2014-05-21  7:22   ` Dan Williams
  1 sibling, 0 replies; 38+ messages in thread
From: Dan Williams @ 2014-05-21  7:22 UTC (permalink / raw)
  To: NeilBrown; +Cc: ksummit-discuss

[ speaking for myself ]

On Thu, May 15, 2014 at 7:56 PM, NeilBrown <neilb@suse.de> wrote:
> On Thu, 15 May 2014 16:13:58 -0700 Dan Williams <dan.j.williams@gmail.com>
> wrote:
>
>> What would it take and would we even consider moving 2x faster than we
>> are now?
>
> Hi Dan,
>  you seem to be suggesting that there is some limit other than "competent
>  engineering time" which is slowing Linux "progress" down.

Where "progress" is "value delivered to users", yes.

>  Are you really suggesting that?

Yes, look at -staging as the first step down this path.  Functionality
delivered to users while "upstream acceptance" happens in parallel.
I'm arguing for a finer grained mechanism for staging functionality
out to users.

> What might these other limits be?

Testing and audience.  A simplistic example of moving slow is merging
a feature only after it has proven to have a large enough audience.
Or the opposite, spending development resources to polish and merge a
dead-on-arrival solution, but only discovering that fact once exposed
to wider distribution.

>  Certainly there are limits to minimum gap between conceptualisation and
>  release (at least one release cycle), but is there really a limit to the
>  parallelism that can be achieved?

Again, in general, I think there are aspects of "upstream acceptance"
that can done in parallel with delivering value to end users.

^ permalink raw reply	[flat|nested] 38+ messages in thread

end of thread, other threads:[~2014-05-24 19:19 UTC | newest]

Thread overview: 38+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-05-15 23:13 [Ksummit-discuss] [CORE TOPIC] [nomination] Move Fast and Oops Things Dan Williams
2014-05-16  2:56 ` NeilBrown
2014-05-16 15:04   ` Chris Mason
2014-05-16 17:09     ` Andy Grover
2014-05-23  8:11       ` Dan Carpenter
2014-05-16 18:31     ` Randy Dunlap
2014-05-21  7:48     ` Dan Williams
2014-05-21  7:55       ` Greg KH
2014-05-21  9:05         ` Matt Fleming
2014-05-21 12:52           ` Greg KH
2014-05-21 13:23             ` Matt Fleming
2014-05-21  8:25       ` NeilBrown
2014-05-21  8:36         ` Dan Williams
2014-05-21  8:53           ` Matt Fleming
2014-05-21 10:11           ` NeilBrown
2014-05-21 15:35             ` Dan Williams
2014-05-21 23:06               ` Rafael J. Wysocki
2014-05-21 23:03                 ` Dan Williams
2014-05-21 23:40                   ` Laurent Pinchart
2014-05-22  0:10                   ` Rafael J. Wysocki
2014-05-22 15:48                   ` Theodore Ts'o
2014-05-22 16:31                     ` Dan Williams
2014-05-22 17:38                       ` Theodore Ts'o
2014-05-22 18:42                       ` Dan Williams
2014-05-22 19:06                         ` Chris Mason
2014-05-22 20:31                       ` Dan Carpenter
2014-05-22 20:56                         ` Geert Uytterhoeven
2014-05-23  6:21                           ` James Bottomley
2014-05-23 14:11                             ` John W. Linville
2014-05-24  9:14                               ` James Bottomley
2014-05-24 19:19                                 ` Geert Uytterhoeven
2014-05-23  2:13                       ` Greg KH
2014-05-23  3:03                         ` Dan Williams
2014-05-23  7:44                           ` Greg KH
2014-05-23 14:02                         ` Josh Boyer
2014-05-21 23:48               ` NeilBrown
2014-05-22  4:04                 ` Dan Williams
2014-05-21  7:22   ` Dan Williams

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox