From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org [172.17.192.35]) by mail.linuxfoundation.org (Postfix) with ESMTP id 8B1413C6 for ; Wed, 21 May 2014 23:03:51 +0000 (UTC) Received: from mail-vc0-f175.google.com (mail-vc0-f175.google.com [209.85.220.175]) by smtp1.linuxfoundation.org (Postfix) with ESMTPS id 98980201CF for ; Wed, 21 May 2014 23:03:50 +0000 (UTC) Received: by mail-vc0-f175.google.com with SMTP id id10so69853vcb.20 for ; Wed, 21 May 2014 16:03:49 -0700 (PDT) MIME-Version: 1.0 In-Reply-To: <2980546.hqgiQV7seV@vostro.rjw.lan> References: <20140521201108.76ab84af@notabene.brown> <2980546.hqgiQV7seV@vostro.rjw.lan> Date: Wed, 21 May 2014 16:03:49 -0700 Message-ID: From: Dan Williams To: "Rafael J. Wysocki" Content-Type: text/plain; charset=UTF-8 Cc: ksummit-discuss@lists.linuxfoundation.org Subject: Re: [Ksummit-discuss] [CORE TOPIC] [nomination] Move Fast and Oops Things List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , On Wed, May 21, 2014 at 4:06 PM, Rafael J. Wysocki wrote: > On Wednesday, May 21, 2014 08:35:55 AM Dan Williams wrote: >> On Wed, May 21, 2014 at 3:11 AM, NeilBrown wrote: >> > On Wed, 21 May 2014 01:36:55 -0700 Dan Williams >> > wrote: >> > >> >> On Wed, May 21, 2014 at 1:25 AM, NeilBrown wrote: >> >> > On Wed, 21 May 2014 00:48:48 -0700 Dan Williams >> >> > wrote: >> >> > >> >> >> On Fri, May 16, 2014 at 8:04 AM, Chris Mason wrote: >> >> >> > -----BEGIN PGP SIGNED MESSAGE----- >> >> >> > Hash: SHA1 >> >> >> > >> >> >> > On 05/15/2014 10:56 PM, NeilBrown wrote: >> >> >> >> On Thu, 15 May 2014 16:13:58 -0700 Dan Williams >> >> >> >> wrote: >> >> >> >> >> >> >> >>> What would it take and would we even consider moving 2x faster >> >> >> >>> than we are now? >> >> >> >> >> >> >> >> Hi Dan, you seem to be suggesting that there is some limit other >> >> >> >> than "competent engineering time" which is slowing Linux "progress" >> >> >> >> down. >> >> >> >> >> >> >> >> Are you really suggesting that? What might these other limits be? >> >> >> >> >> >> >> >> Certainly there are limits to minimum gap between conceptualisation >> >> >> >> and release (at least one release cycle), but is there really a >> >> >> >> limit to the parallelism that can be achieved? >> >> >> > >> >> >> > I haven't compared the FB commit rates with the kernel, but I'll >> >> >> > pretend Dan's basic thesis is right and talk about which parts of the >> >> >> > facebook model may move faster than the kernel. >> >> >> > >> >> >> > The facebook is pretty similar to the way the kernel works. The merge >> >> >> > window lasts a few days and the major releases are every week, but >> >> >> > overall it isn't too far away. >> >> >> > >> >> >> > The biggest difference is that we have a centralized tool for >> >> >> > reviewing the patches, and once it has been reviewed by a specific >> >> >> > number of people, you push it in. >> >> >> > >> >> >> > The patch submission tool runs the patch through lint and various >> >> >> > static analysis to make sure it follows proper coding style and >> >> >> > doesn't include patterns of known bugs. This cuts down on the review >> >> >> > work because the silly coding style mistakes are gone before it gets >> >> >> > to the tool. >> >> >> > >> >> >> > When you put in a patch, you have to put in reviewers, and they get a >> >> >> > little notification that your patch needs review. Once the reviewers >> >> >> > are happy, you push the patch in. >> >> >> > >> >> >> > The biggest difference: there are no maintainers. If I want to go >> >> >> > change the calendar tool to fix a bug, I patch it, get someone else to >> >> >> > sign off and push. >> >> >> > >> >> >> > All of which is my way of saying the maintainers (me included) are the >> >> >> > biggest bottleneck. There are a lot of reasons I think the maintainer >> >> >> > model fits the kernel better, but at least for btrfs I'm trying to >> >> >> > speed up the patch review process and use patchwork more effectively. >> >> >> >> >> >> To be clear, I'm not arguing for a maintainer-less model. We don't >> >> >> have the tooling or operational-data to support that. We need >> >> >> maintainers to say "no". But, what I think we can do is give >> >> >> maintainers more varied ways to say it. The goal, de-escalate the >> >> >> merge event as a declaration that the code quality/architecture >> >> >> conversation is over. >> >> >> >> >> >> Release early, release often, and with care merge often. >> >> > >> >> > I think this falls foul of the "no regressions" rule. >> >> > >> >> > The kernel policy is that once the functionality gets to users, it cannot be >> >> > taken away. Individual drivers in 'staging' manage to avoid this rule >> >> > because that are clearly separate things. >> >> > New system calls and attributes in sysfs etc seem to be much harder to >> >> > "partially" release. >> >> >> >> My straw man is something like the following for driver "foo" >> >> >> >> if (gatekeeper_foo_new_awesome_sauce) >> >> do_new_thing(); >> >> >> >> Where setting gatekeeper_foo_new_awesome_sauce taints the kernel and >> >> warns that there is no guarantee of this functionality being present >> >> in the same form or at all going forward. >> > >> > Interesting idea. >> > Trying to imagine how this might play out in practice.... >> > >> > You talk about "value delivered to users". But users tend to use >> > applications, and applications are the users of kernel features. >> > >> > Will anyone bother writing or adapting an application to use a feature which >> > is not guaranteed to hang around? >> > Maybe they will, but will the users of the application know that it might >> > stop working after a kernel upgrade? Maybe... >> > >> > Maybe if we had some concrete examples of features that could have been >> > delayed using a gatekeeper. >> > >> > The one that springs to my mind is cgroups. Clearly useful, but clearly >> > controversial. It appears that the original implementation was seriously >> > flawed and Tejun is doing a massive amount of work to "fix" it, and this >> > apparently will lead to API changes. And this is happening without any >> > gatekeepers. Would it have been easier in some way with gatekeepers? >> > ... I don't see how it would be, except that fewer people would have used >> > cgroups, and then maybe we wouldn't have as much collective experience to >> > know what the real problems were(?). >> > >> > I think that is the key. With a user-facing option, people will try it and >> > probably cope if it disappears (though they might complain loudly and sign >> > petitions declaring facebook to be the anti-$DEITY). However with kernel >> > internal options, applications are unlikely to use them without some >> > expectation of stability. So finding the problems would be a lot harder. >> > >> > Which doesn't mean that it can't work, but it would be nice if create some >> > real life examples to see how it plays out in practice. >> > >> >> Biased by my background of course, but I think driver development is >> more amenable to this sort of approach. For drivers the kernel is in >> many instances the application. For example, I currently have in my >> review queue a patch set to add sata port multiplier support to >> libsas. I hope I get the review done in time for merging it in 3.16. >> But, what if I also had the option of saying "let's gatekeeper this >> for a cycle". Users that care could start using it and reporting >> bugs, and it would be clear that the implementation is provisional. >> My opinion is that bug reports would attract deeper code review that >> otherwise would not occur if the feature was simply delayed for a >> cycle. > > There's more to that. > > The model you're referring to is only possible if all participants are > employees of one company or otherwise members of one organization that > has some kind of control over them. The kernel development is not done > like that, though, so I'm afraid that the Facebook experience is not > applicable here directly. > > For example, we take patches from pretty much everyone on the Internet. > Does Facebook do that too? I don't think so. > I'm struggling to see how this addresses my new libsas feature example? Simply, if an end user knows how to override a "gatekeeper" that user can test features that we are otherwise still debating upstream. They can of course also apply the patches directly, but I am proposing we formalize a mechanism to encourage more experimentation in-tree. I'm fully aware we do not have the tactical data nor operational control to run the kernel like a website, that's not my concern. My concern is with expanding a maintainer's options for mitigating risk.