* [Ksummit-discuss] [MAINTAINERS SUMMIT] stable kernel process automation and improvement
@ 2019-07-03 1:35 Sasha Levin
2019-07-03 14:57 ` Laura Abbott
` (2 more replies)
0 siblings, 3 replies; 27+ messages in thread
From: Sasha Levin @ 2019-07-03 1:35 UTC (permalink / raw)
To: ksummit-discuss
Hi folks,
If there is interest, I'd like to go over the (minor) changes that went
into the -stable kernel process since last year's MS, the various
automations we now have, and how we have addressed some of the pain
points that came up last year. I'd also love to hear from folks about
the issues they're seeing with the process, and if there's anything we
can do to make it better.
Some of the concerns that were raised during last year's MS (both in the
group session as well as in the hallway track) which we've tried to
address are:
- Commits missing because authors did not respond to Greg's "FAILED:"
mails.
- Concerns about how well -stable kernels are tested.
- "Fixes for fixes" end up being missed.
- Saner AUTOSEL process.
- Tracking of dropped commits.
I found last years feedback very valuable and hopefully have addressed
some of it, hoping for the same this year as well.
--
Thanks,
Sasha
^ permalink raw reply [flat|nested] 27+ messages in thread* Re: [Ksummit-discuss] [MAINTAINERS SUMMIT] stable kernel process automation and improvement 2019-07-03 1:35 [Ksummit-discuss] [MAINTAINERS SUMMIT] stable kernel process automation and improvement Sasha Levin @ 2019-07-03 14:57 ` Laura Abbott 2019-07-05 13:54 ` Michael Ellerman 2019-07-05 16:41 ` Mark Brown 2 siblings, 0 replies; 27+ messages in thread From: Laura Abbott @ 2019-07-03 14:57 UTC (permalink / raw) To: Sasha Levin, ksummit-discuss On 7/2/19 9:35 PM, Sasha Levin wrote: > Hi folks, > > If there is interest, I'd like to go over the (minor) changes that went > into the -stable kernel process since last year's MS, the various > automations we now have, and how we have addressed some of the pain > points that came up last year. I'd also love to hear from folks about > the issues they're seeing with the process, and if there's anything we > can do to make it better. > > Some of the concerns that were raised during last year's MS (both in the > group session as well as in the hallway track) which we've tried to > address are: > > - Commits missing because authors did not respond to Greg's "FAILED:" > mails. > - Concerns about how well -stable kernels are tested. > - "Fixes for fixes" end up being missed. > - Saner AUTOSEL process. > - Tracking of dropped commits. > > I found last years feedback very valuable and hopefully have addressed > some of it, hoping for the same this year as well. I'm certainly interested in this. Thanks, Laura ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [Ksummit-discuss] [MAINTAINERS SUMMIT] stable kernel process automation and improvement 2019-07-03 1:35 [Ksummit-discuss] [MAINTAINERS SUMMIT] stable kernel process automation and improvement Sasha Levin 2019-07-03 14:57 ` Laura Abbott @ 2019-07-05 13:54 ` Michael Ellerman 2019-07-05 14:13 ` Takashi Iwai 2019-07-05 16:41 ` Mark Brown 2 siblings, 1 reply; 27+ messages in thread From: Michael Ellerman @ 2019-07-05 13:54 UTC (permalink / raw) To: Sasha Levin, ksummit-discuss Sasha Levin <sashal@kernel.org> writes: > Hi folks, > > If there is interest, I'd like to go over the (minor) changes that went > into the -stable kernel process since last year's MS, the various > automations we now have, and how we have addressed some of the pain > points that came up last year. I'd also love to hear from folks about > the issues they're seeing with the process, and if there's anything we > can do to make it better. > > Some of the concerns that were raised during last year's MS (both in the > group session as well as in the hallway track) which we've tried to > address are: > > - Commits missing because authors did not respond to Greg's "FAILED:" > mails. > - Concerns about how well -stable kernels are tested. > - "Fixes for fixes" end up being missed. > - Saner AUTOSEL process. > - Tracking of dropped commits. Yeah definitely interested in this. Especially the tracking part. I have been trying to keep track of powerpc commits that need backporting, but haven't really come up with a good system. So would be interested in what you and/or others are doing. Something I've been experimenting with is using git notes to mark commits that have been fixed by a subsequent commit. This gives you a two way link between the fix and the fixed commit, and you can get the notes to show up in git log, like: commit 1846193b178dcc58435fdc57352db7b74826ef37 Author: Michael Ellerman <mpe@ellerman.id.au> Date: Thu Jul 7 22:54:29 2016 +1000 powerpc/xmon: Dump ISA 2.06 SPRs Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Notes (fixed): Fixed-by: c47a94031e81 ("powerpc/xmon: Fix display of SPRs") I'd like to extend this to the stable trees, so you could have output something like: commit 1846193b178dcc58435fdc57352db7b74826ef37 Author: Michael Ellerman <mpe@ellerman.id.au> Date: Thu Jul 7 22:54:29 2016 +1000 powerpc/xmon: Dump ISA 2.06 SPRs Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Notes (fixed): Fixed-by: c47a94031e81 ("powerpc/xmon: Fix display of SPRs") v4.9.y: deadbeef0000 ("powerpc/xmon: Fix display of SPRs") v4.10.y: not found Git notes are also just blobs, so in theory the processing to generate those notes could be done once and pushed to a repo where everyone could pull them. cheers ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [Ksummit-discuss] [MAINTAINERS SUMMIT] stable kernel process automation and improvement 2019-07-05 13:54 ` Michael Ellerman @ 2019-07-05 14:13 ` Takashi Iwai 2019-07-05 16:17 ` Greg KH 2019-07-05 16:52 ` Sasha Levin 0 siblings, 2 replies; 27+ messages in thread From: Takashi Iwai @ 2019-07-05 14:13 UTC (permalink / raw) To: Michael Ellerman; +Cc: ksummit-discuss On Fri, 05 Jul 2019 15:54:11 +0200, Michael Ellerman wrote: > > Sasha Levin <sashal@kernel.org> writes: > > Hi folks, > > > > If there is interest, I'd like to go over the (minor) changes that went > > into the -stable kernel process since last year's MS, the various > > automations we now have, and how we have addressed some of the pain > > points that came up last year. I'd also love to hear from folks about > > the issues they're seeing with the process, and if there's anything we > > can do to make it better. > > > > Some of the concerns that were raised during last year's MS (both in the > > group session as well as in the hallway track) which we've tried to > > address are: > > > > - Commits missing because authors did not respond to Greg's "FAILED:" > > mails. > > - Concerns about how well -stable kernels are tested. > > - "Fixes for fixes" end up being missed. > > - Saner AUTOSEL process. > > - Tracking of dropped commits. > > Yeah definitely interested in this. > > Especially the tracking part. I have been trying to keep track of > powerpc commits that need backporting, but haven't really come up with a > good system. So would be interested in what you and/or others are doing. > > Something I've been experimenting with is using git notes to mark > commits that have been fixed by a subsequent commit. This gives you a > two way link between the fix and the fixed commit, and you can get the > notes to show up in git log, like: > > commit 1846193b178dcc58435fdc57352db7b74826ef37 > Author: Michael Ellerman <mpe@ellerman.id.au> > Date: Thu Jul 7 22:54:29 2016 +1000 > > powerpc/xmon: Dump ISA 2.06 SPRs > > Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> > > Notes (fixed): > Fixed-by: c47a94031e81 ("powerpc/xmon: Fix display of SPRs") > > > I'd like to extend this to the stable trees, so you could have output > something like: > > commit 1846193b178dcc58435fdc57352db7b74826ef37 > Author: Michael Ellerman <mpe@ellerman.id.au> > Date: Thu Jul 7 22:54:29 2016 +1000 > > powerpc/xmon: Dump ISA 2.06 SPRs > > Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> > > Notes (fixed): > Fixed-by: c47a94031e81 ("powerpc/xmon: Fix display of SPRs") > v4.9.y: deadbeef0000 ("powerpc/xmon: Fix display of SPRs") > v4.10.y: not found > > > Git notes are also just blobs, so in theory the processing to generate > those notes could be done once and pushed to a repo where everyone could > pull them. Yes, I'd love to have (and share) this kind of reverse mapping information. But somehow using git-notes for such a purpose wasn't accepted widely. IIRC, Linus mentioned that git-notes is a hack, and indeed it is. But if the entries aren't too big, it would work well enough, I guess. Once when the size matters, we can reconsider to switch to a better infrastructure... FWIW, SUSE tracks the possible upstream fixes by parsing Fixes tag regularly, so it's proven to be useful. thanks, Takashi ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [Ksummit-discuss] [MAINTAINERS SUMMIT] stable kernel process automation and improvement 2019-07-05 14:13 ` Takashi Iwai @ 2019-07-05 16:17 ` Greg KH 2019-07-05 16:52 ` Sasha Levin 1 sibling, 0 replies; 27+ messages in thread From: Greg KH @ 2019-07-05 16:17 UTC (permalink / raw) To: Takashi Iwai; +Cc: ksummit-discuss On Fri, Jul 05, 2019 at 04:13:35PM +0200, Takashi Iwai wrote: > > FWIW, SUSE tracks the possible upstream fixes by parsing Fixes tag > regularly, so it's proven to be useful. Yeah, it's the fixes tag parsing that I know I use (well, should use more often than I do). I think Sasha runs that type of script more often than I do. thanks, greg k-h ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [Ksummit-discuss] [MAINTAINERS SUMMIT] stable kernel process automation and improvement 2019-07-05 14:13 ` Takashi Iwai 2019-07-05 16:17 ` Greg KH @ 2019-07-05 16:52 ` Sasha Levin 1 sibling, 0 replies; 27+ messages in thread From: Sasha Levin @ 2019-07-05 16:52 UTC (permalink / raw) To: Takashi Iwai; +Cc: ksummit-discuss On Fri, Jul 05, 2019 at 04:13:35PM +0200, Takashi Iwai wrote: >On Fri, 05 Jul 2019 15:54:11 +0200, >Michael Ellerman wrote: >> >> Sasha Levin <sashal@kernel.org> writes: >> > Hi folks, >> > >> > If there is interest, I'd like to go over the (minor) changes that went >> > into the -stable kernel process since last year's MS, the various >> > automations we now have, and how we have addressed some of the pain >> > points that came up last year. I'd also love to hear from folks about >> > the issues they're seeing with the process, and if there's anything we >> > can do to make it better. >> > >> > Some of the concerns that were raised during last year's MS (both in the >> > group session as well as in the hallway track) which we've tried to >> > address are: >> > >> > - Commits missing because authors did not respond to Greg's "FAILED:" >> > mails. >> > - Concerns about how well -stable kernels are tested. >> > - "Fixes for fixes" end up being missed. >> > - Saner AUTOSEL process. >> > - Tracking of dropped commits. >> >> Yeah definitely interested in this. >> >> Especially the tracking part. I have been trying to keep track of >> powerpc commits that need backporting, but haven't really come up with a >> good system. So would be interested in what you and/or others are doing. >> >> Something I've been experimenting with is using git notes to mark >> commits that have been fixed by a subsequent commit. This gives you a >> two way link between the fix and the fixed commit, and you can get the >> notes to show up in git log, like: >> >> commit 1846193b178dcc58435fdc57352db7b74826ef37 >> Author: Michael Ellerman <mpe@ellerman.id.au> >> Date: Thu Jul 7 22:54:29 2016 +1000 >> >> powerpc/xmon: Dump ISA 2.06 SPRs >> >> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> >> >> Notes (fixed): >> Fixed-by: c47a94031e81 ("powerpc/xmon: Fix display of SPRs") >> >> >> I'd like to extend this to the stable trees, so you could have output >> something like: >> >> commit 1846193b178dcc58435fdc57352db7b74826ef37 >> Author: Michael Ellerman <mpe@ellerman.id.au> >> Date: Thu Jul 7 22:54:29 2016 +1000 >> >> powerpc/xmon: Dump ISA 2.06 SPRs >> >> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> >> >> Notes (fixed): >> Fixed-by: c47a94031e81 ("powerpc/xmon: Fix display of SPRs") >> v4.9.y: deadbeef0000 ("powerpc/xmon: Fix display of SPRs") >> v4.10.y: not found >> >> >> Git notes are also just blobs, so in theory the processing to generate >> those notes could be done once and pushed to a repo where everyone could >> pull them. > >Yes, I'd love to have (and share) this kind of reverse mapping >information. But somehow using git-notes for such a purpose wasn't >accepted widely. IIRC, Linus mentioned that git-notes is a hack, and >indeed it is. But if the entries aren't too big, it would work well >enough, I guess. Once when the size matters, we can reconsider to >switch to a better infrastructure... > >FWIW, SUSE tracks the possible upstream fixes by parsing Fixes tag >regularly, so it's proven to be useful. Indeed, I also have quite a few scripts that do interesting things with the fixes tag (such as the "fixes for fixes" script which tries to understand if a certain fix was backported, and the new fix would apply to older LTS trees). I'm toying with a similar idea for git notes, but my approach was to extract mailing list conversations that are related to the patch in question and add them as git notes to the commit they're discussing. This means that when I do 'git log' to see a commit I'm about to backport, I also get all the mailing list context related to it which often tends to be more valuable than the commit message itself. This is the sort of things I feel would be useful beyond just -stable work; I'm sure that everyone spent hours sifting through the mailing list to understand some of the logic of a given patch. I'd love to have better integration between our git tree and the mailing list. -- Thanks, Sasha ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [Ksummit-discuss] [MAINTAINERS SUMMIT] stable kernel process automation and improvement 2019-07-03 1:35 [Ksummit-discuss] [MAINTAINERS SUMMIT] stable kernel process automation and improvement Sasha Levin 2019-07-03 14:57 ` Laura Abbott 2019-07-05 13:54 ` Michael Ellerman @ 2019-07-05 16:41 ` Mark Brown 2019-07-05 20:12 ` Sasha Levin 2 siblings, 1 reply; 27+ messages in thread From: Mark Brown @ 2019-07-05 16:41 UTC (permalink / raw) To: Sasha Levin; +Cc: ksummit-discuss [-- Attachment #1: Type: text/plain, Size: 510 bytes --] On Tue, Jul 02, 2019 at 09:35:57PM -0400, Sasha Levin wrote: > - Concerns about how well -stable kernels are tested. > - "Fixes for fixes" end up being missed. > - Saner AUTOSEL process. I'm a bit worried about these, especially pushed together - one of the things the AUTOSEL stuff does quite often is pull in driver changes and our coverage of drivers is especially weak. When a person has explicitly flagged something for stable it's a still risky but the automation adds that extra level of uncertainty. [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 488 bytes --] ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [Ksummit-discuss] [MAINTAINERS SUMMIT] stable kernel process automation and improvement 2019-07-05 16:41 ` Mark Brown @ 2019-07-05 20:12 ` Sasha Levin 2019-07-06 0:32 ` Mark Brown 0 siblings, 1 reply; 27+ messages in thread From: Sasha Levin @ 2019-07-05 20:12 UTC (permalink / raw) To: Mark Brown; +Cc: ksummit-discuss On Fri, Jul 05, 2019 at 05:41:42PM +0100, Mark Brown wrote: >On Tue, Jul 02, 2019 at 09:35:57PM -0400, Sasha Levin wrote: > >> - Concerns about how well -stable kernels are tested. >> - "Fixes for fixes" end up being missed. >> - Saner AUTOSEL process. > >I'm a bit worried about these, especially pushed together - one >of the things the AUTOSEL stuff does quite often is pull in >driver changes and our coverage of drivers is especially weak. Our driver coverage is indeed weak, but I don't think that the solution is to leave drivers/ alone. On the contrary, I think that making drivers/ move quickly together with the rest of the kernel will encourage vendors to up their testing game. This came up in the last MS, and the agreement there was that we expect stable kernel users to test their workloads before throwing it into production. If we were to start avoiding driver updates, it would act as an incentive for people not to upgrade their kernel. Right now I'm working with a certain hardware vendor who does a crappy job at tagging fixes for stable, and it's horribly painful. I end up spending time triaging a bug, reporting it to the vendor, only to be told "oh grab this fix from upstream". This user experience is just bad, and I can't imagine how difficult it is for users who are less familiar with the kerenl. -- Thanks, Sasha ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [Ksummit-discuss] [MAINTAINERS SUMMIT] stable kernel process automation and improvement 2019-07-05 20:12 ` Sasha Levin @ 2019-07-06 0:32 ` Mark Brown 2019-07-08 11:02 ` Sasha Levin 0 siblings, 1 reply; 27+ messages in thread From: Mark Brown @ 2019-07-06 0:32 UTC (permalink / raw) To: Sasha Levin; +Cc: ksummit-discuss [-- Attachment #1: Type: text/plain, Size: 4113 bytes --] On Fri, Jul 05, 2019 at 04:12:31PM -0400, Sasha Levin wrote: > On Fri, Jul 05, 2019 at 05:41:42PM +0100, Mark Brown wrote: > > I'm a bit worried about these, especially pushed together - one > > of the things the AUTOSEL stuff does quite often is pull in > > driver changes and our coverage of drivers is especially weak. > Our driver coverage is indeed weak, but I don't think that the solution > is to leave drivers/ alone. On the contrary, I think that making > drivers/ move quickly together with the rest of the kernel will > encourage vendors to up their testing game. I'm not saying leave it alone, it's more a question of how aggressive we are about picking up things we think might be relevant fixes but haven't had some sort of domain specific analysis of. Testing is a good way to mitigate the potential risks here. > This came up in the last MS, and the agreement there was that we expect > stable kernel users to test their workloads before throwing it into > production. That's kind of the problem - if people are doing testing and end up finding problems coming back in the stable kernel that's the sort of thing that encourages them to not just take stable en masse as we say they should. Part of the deal with stable is that it is conservative, people can trust it to be a low risk update. That's not happening now as far as I'm aware but it does worry me that it might happen. > If we were to start avoiding driver updates, it would act as an > incentive for people not to upgrade their kernel. I'm not sure I follow the logic here? > Right now I'm working with a certain hardware vendor who does a crappy > job at tagging fixes for stable, and it's horribly painful. I end up > spending time triaging a bug, reporting it to the vendor, only to be > told "oh grab this fix from upstream". > This user experience is just bad, and I can't imagine how difficult it > is for users who are less familiar with the kerenl. Well, the advice from the upstream community has always been that you should track upstream and I'm sure people will be praising this vendor's upstream focus but obviously that's not always terribly helpful or realistic for production systems. In my (mostly embedded and consumer electronics based) experience support for older kernel versions is generally part of the commercial discussion with the hardware vendor, there's an understanding that the hardware will only get bought if it works on kernel versions that are useful to the customer or (depending on the power relationships) that the customer will use kernel versions that the vendor supports. Sometimes, especially for smaller customers, that doesn't work out but those are usually the people who are more likely to track upstream and/or do considerable testing before fixing a version and generally are on their own. This is where the out of tree patch stacks from vendors come from - everyone agrees that they'll use one or more given kernel versions, enterprise distros or whatever and then the vendor commits to supporting what's agreed but often that doesn't just include bug fixing but also new features (or entirely new bits of hardware). As a result those vendors are shipping their patch stacks out of tree, users are getting their bug fixes from there and those vendors are not finding much user demand for vanilla LTS as a separate thing. They may even find conflicts with it an annoying hassle. Frankly for them upstream support is often a bit of an investment in reducing the cost of future out of tree patch stacks and giving a longer general market life to products rather than something customers directly demand. None of this is ideal from an upstream point of view of course but it does function for people. It sounds like somewhere along the line this process has come unstuck for you and you have a vendor that's not aligned with what you need but I don't think that's quite the same question as the issues with pulling patches into stable without either testing coverage or direct identification of an issue by someone with domain knowledge which is what I'm worrying about. [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 488 bytes --] ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [Ksummit-discuss] [MAINTAINERS SUMMIT] stable kernel process automation and improvement 2019-07-06 0:32 ` Mark Brown @ 2019-07-08 11:02 ` Sasha Levin 2019-07-08 11:35 ` Jiri Kosina 2019-07-08 12:37 ` Mark Brown 0 siblings, 2 replies; 27+ messages in thread From: Sasha Levin @ 2019-07-08 11:02 UTC (permalink / raw) To: Mark Brown; +Cc: ksummit-discuss On Sat, Jul 06, 2019 at 01:32:14AM +0100, Mark Brown wrote: >On Fri, Jul 05, 2019 at 04:12:31PM -0400, Sasha Levin wrote: >> On Fri, Jul 05, 2019 at 05:41:42PM +0100, Mark Brown wrote: > >> > I'm a bit worried about these, especially pushed together - one >> > of the things the AUTOSEL stuff does quite often is pull in >> > driver changes and our coverage of drivers is especially weak. > >> Our driver coverage is indeed weak, but I don't think that the solution >> is to leave drivers/ alone. On the contrary, I think that making >> drivers/ move quickly together with the rest of the kernel will >> encourage vendors to up their testing game. > >I'm not saying leave it alone, it's more a question of how >aggressive we are about picking up things we think might be >relevant fixes but haven't had some sort of domain specific >analysis of. Testing is a good way to mitigate the potential >risks here. I agree, and for various subsystems and drivers where the maintainers volunteer their domain specific expertise to send backports to stable, I have "blacklisted" it from AUTOSEL since indeed it's a much better option. >> This came up in the last MS, and the agreement there was that we expect >> stable kernel users to test their workloads before throwing it into >> production. > >That's kind of the problem - if people are doing testing and end >up finding problems coming back in the stable kernel that's the >sort of thing that encourages them to not just take stable en >masse as we say they should. Part of the deal with stable is >that it is conservative, people can trust it to be a low risk >update. That's not happening now as far as I'm aware but it does >worry me that it might happen. Right, and the rate at which AUTOSEL commits are reverted is lower than commits that are actually tagged for stable. If AUTOSEL commits on their own were being reverted left and right I'd agree we need to tone it down, but I don't see it happening now. >> If we were to start avoiding driver updates, it would act as an >> incentive for people not to upgrade their kernel. > >I'm not sure I follow the logic here? The way I see it, the lower your "effective delta" is between to kernels, the easier it is to move forward. For example, if I have a product that runs on 4.19 and uses all our core kernel code + 10 drivers, and I know that those drivers had most of the fixes backported to my LTS tree, I'd feel much more confident going to 5.4 knowning that I already have most of the patches that come with 5.4. For me it's a matter of how one would budget a move from a kernel X LTS to kernel Y LTS, and I think that as that budget requirement grows it's actually harder to actually do it (and convince management), acting as a negative incentive to stay with whatever works now. >> Right now I'm working with a certain hardware vendor who does a crappy >> job at tagging fixes for stable, and it's horribly painful. I end up >> spending time triaging a bug, reporting it to the vendor, only to be >> told "oh grab this fix from upstream". > >> This user experience is just bad, and I can't imagine how difficult it >> is for users who are less familiar with the kerenl. > >Well, the advice from the upstream community has always been that >you should track upstream and I'm sure people will be praising >this vendor's upstream focus but obviously that's not always >terribly helpful or realistic for production systems. In my >(mostly embedded and consumer electronics based) experience >support for older kernel versions is generally part of the >commercial discussion with the hardware vendor, there's an >understanding that the hardware will only get bought if it works >on kernel versions that are useful to the customer or (depending >on the power relationships) that the customer will use kernel >versions that the vendor supports. Sometimes, especially for >smaller customers, that doesn't work out but those are usually >the people who are more likely to track upstream and/or do >considerable testing before fixing a version and generally are on >their own. I have a different experience with this. I'd like to think that we're a bigger customer and this process wasn't working too well for us. My thinking was that if it's broken for us I can only imagine how bad it is for the smaller customers. >This is where the out of tree patch stacks from vendors come from >- everyone agrees that they'll use one or more given kernel >versions, enterprise distros or whatever and then the vendor >commits to supporting what's agreed but often that doesn't just >include bug fixing but also new features (or entirely new bits of >hardware). As a result those vendors are shipping their patch >stacks out of tree, users are getting their bug fixes from there >and those vendors are not finding much user demand for vanilla >LTS as a separate thing. They may even find conflicts with it an >annoying hassle. Frankly for them upstream support is often a >bit of an investment in reducing the cost of future out of tree >patch stacks and giving a longer general market life to products >rather than something customers directly demand. None of this is >ideal from an upstream point of view of course but it does >function for people. This is where our story is different, which might explain my experience being different: we usually require vendors to upstream everything, and so they do. This means we don't have much of a out-of-tree patch stacks/fixes from the vendor directly, and we expect to pick up patches via the regular stable process, and that didn't happen all too well so far. -- Thanks, Sasha ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [Ksummit-discuss] [MAINTAINERS SUMMIT] stable kernel process automation and improvement 2019-07-08 11:02 ` Sasha Levin @ 2019-07-08 11:35 ` Jiri Kosina 2019-07-08 12:34 ` Greg KH 2019-07-08 17:56 ` Sasha Levin 2019-07-08 12:37 ` Mark Brown 1 sibling, 2 replies; 27+ messages in thread From: Jiri Kosina @ 2019-07-08 11:35 UTC (permalink / raw) To: Sasha Levin; +Cc: ksummit-discuss On Mon, 8 Jul 2019, Sasha Levin wrote: > > >> If we were to start avoiding driver updates, it would act as an > >> incentive for people not to upgrade their kernel. > > > >I'm not sure I follow the logic here? > > The way I see it, the lower your "effective delta" is between to > kernels, the easier it is to move forward. For example, if I have a > product that runs on 4.19 and uses all our core kernel code + 10 > drivers, and I know that those drivers had most of the fixes backported > to my LTS tree, I'd feel much more confident going to 5.4 knowning that > I already have most of the patches that come with 5.4. > > For me it's a matter of how one would budget a move from a kernel X LTS > to kernel Y LTS, and I think that as that budget requirement grows it's > actually harder to actually do it (and convince management), acting as a > negative incentive to stay with whatever works now. But where does the 'stable' aspect appear here? I think it's reasonable to expect 'stable' to mean 'minimal number of changes needed to maintain stability of the kernel', and that I believe was the original purpose of stable tree. Now you seem to be repurposing 'stable' as 'as close to upstream as possible in order to minimize cost of version updates'. I guess that's one of the reasons why distros are gradually turning away from stable tree the main purpose of distros is to provide stability, while it clearly is not minimizing acumulation of cost for future version updates. Thanks, -- Jiri Kosina SUSE Labs ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [Ksummit-discuss] [MAINTAINERS SUMMIT] stable kernel process automation and improvement 2019-07-08 11:35 ` Jiri Kosina @ 2019-07-08 12:34 ` Greg KH 2019-07-08 17:56 ` Sasha Levin 1 sibling, 0 replies; 27+ messages in thread From: Greg KH @ 2019-07-08 12:34 UTC (permalink / raw) To: Jiri Kosina; +Cc: ksummit-discuss On Mon, Jul 08, 2019 at 01:35:15PM +0200, Jiri Kosina wrote: > On Mon, 8 Jul 2019, Sasha Levin wrote: > > > > >> If we were to start avoiding driver updates, it would act as an > > >> incentive for people not to upgrade their kernel. > > > > > >I'm not sure I follow the logic here? > > > > The way I see it, the lower your "effective delta" is between to > > kernels, the easier it is to move forward. For example, if I have a > > product that runs on 4.19 and uses all our core kernel code + 10 > > drivers, and I know that those drivers had most of the fixes backported > > to my LTS tree, I'd feel much more confident going to 5.4 knowning that > > I already have most of the patches that come with 5.4. > > > > For me it's a matter of how one would budget a move from a kernel X LTS > > to kernel Y LTS, and I think that as that budget requirement grows it's > > actually harder to actually do it (and convince management), acting as a > > negative incentive to stay with whatever works now. > > But where does the 'stable' aspect appear here? > > I think it's reasonable to expect 'stable' to mean 'minimal number of > changes needed to maintain stability of the kernel', and that I believe > was the original purpose of stable tree. > > Now you seem to be repurposing 'stable' as 'as close to upstream as > possible in order to minimize cost of version updates'. "stable" means "All the bugfixes that we have in Linus's tree, backported to this one as well to resolve known issues". That's all that is happening here with the autosel stuff. There are a load of subsystems that still do not tag stuff for stable backporting, and sometimes even the maintainers miss them as well (I am guilty of that as well.) So autosel finds those fixes and backports them, it's no different from a distro doing the exact same thing when a bug report comes into it, but it happens _BEFORE_ the bug report happens. > I guess that's one of the reasons why distros are gradually turning away > from stable tree the main purpose of distros is to provide stability, > while it clearly is not minimizing acumulation of cost for future version > updates. That's directly opposite of what I see happening with loads of real-world devices. As proof of this, and as part of a talk I gave a few weeks ago, I can quote the Android security team. They kept track of all requests that they made to be backported to their device trees for 2018. Out of 218 requests, 201 of them were _ALREADY_ in the LTS release tree. The other remaining ones were due to out-of-tree code being in the devices, or due to bugs in backports that were not upstream. So again, bugs are being fixed _before_ people report them, which sounds exactly like what a distro needs to have happen for them :) thanks, greg k-h ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [Ksummit-discuss] [MAINTAINERS SUMMIT] stable kernel process automation and improvement 2019-07-08 11:35 ` Jiri Kosina 2019-07-08 12:34 ` Greg KH @ 2019-07-08 17:56 ` Sasha Levin 1 sibling, 0 replies; 27+ messages in thread From: Sasha Levin @ 2019-07-08 17:56 UTC (permalink / raw) To: Jiri Kosina; +Cc: ksummit-discuss On Mon, Jul 08, 2019 at 01:35:15PM +0200, Jiri Kosina wrote: >On Mon, 8 Jul 2019, Sasha Levin wrote: >> >> >> If we were to start avoiding driver updates, it would act as an >> >> incentive for people not to upgrade their kernel. >> > >> >I'm not sure I follow the logic here? >> >> The way I see it, the lower your "effective delta" is between to >> kernels, the easier it is to move forward. For example, if I have a >> product that runs on 4.19 and uses all our core kernel code + 10 >> drivers, and I know that those drivers had most of the fixes backported >> to my LTS tree, I'd feel much more confident going to 5.4 knowning that >> I already have most of the patches that come with 5.4. >> >> For me it's a matter of how one would budget a move from a kernel X LTS >> to kernel Y LTS, and I think that as that budget requirement grows it's >> actually harder to actually do it (and convince management), acting as a >> negative incentive to stay with whatever works now. > >But where does the 'stable' aspect appear here? > >I think it's reasonable to expect 'stable' to mean 'minimal number of >changes needed to maintain stability of the kernel', and that I believe >was the original purpose of stable tree. I think that we're parsing the words "stable kernel" differently. You see "stable kernel" as a kernel that remains mostly the same over time and accepts a very small amount of critical fixes. On the other hand, my expectation of a "stable kernel" is a kernel without known bugs. I associate the word "stable" with stable runtime rather than a stable codebase. >Now you seem to be repurposing 'stable' as 'as close to upstream as >possible in order to minimize cost of version updates'. I don't think that the stable kernel was meant to lag behind upstream too much. Even the rules suggest that a commit just has to be upstream, without regard to how long (as long as it made one release, so ~1 week tops). I'm not suggesting that we should be in sync with Linus, all I'm saying that users who stay close to upstream have an easier time moving to newer kernels, and we want to provide that ability to users of the stable kernel. >I guess that's one of the reasons why distros are gradually turning away >from stable tree the main purpose of distros is to provide stability, >while it clearly is not minimizing acumulation of cost for future version >updates. I'm not sure about statistics, but I think that the stable tree is gaining more "distro" users than losing them. I think it's also important to note here that the stable tree doesn't work for everyone, and that's perfectly fine. Even with all the AUTOSEL stuff that go in, a quick look at my mailbox suggests that I spent more time finding missing patches from various distro trees than reverting patches from the stable trees. -- Thanks, Sasha ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [Ksummit-discuss] [MAINTAINERS SUMMIT] stable kernel process automation and improvement 2019-07-08 11:02 ` Sasha Levin 2019-07-08 11:35 ` Jiri Kosina @ 2019-07-08 12:37 ` Mark Brown 2019-07-08 14:05 ` Guenter Roeck 2019-07-08 18:01 ` Sasha Levin 1 sibling, 2 replies; 27+ messages in thread From: Mark Brown @ 2019-07-08 12:37 UTC (permalink / raw) To: Sasha Levin; +Cc: ksummit-discuss [-- Attachment #1: Type: text/plain, Size: 4321 bytes --] On Mon, Jul 08, 2019 at 07:02:08AM -0400, Sasha Levin wrote: > On Sat, Jul 06, 2019 at 01:32:14AM +0100, Mark Brown wrote: > > I'm not saying leave it alone, it's more a question of how > > aggressive we are about picking up things we think might be > > relevant fixes but haven't had some sort of domain specific > > analysis of. Testing is a good way to mitigate the potential > > risks here. > I agree, and for various subsystems and drivers where the maintainers > volunteer their domain specific expertise to send backports to stable, I > have "blacklisted" it from AUTOSEL since indeed it's a much better > option. Hrm, it's definitely getting a bunch of stuff for my subsystems where I do tag things for stable... > > > This came up in the last MS, and the agreement there was that we expect > > > stable kernel users to test their workloads before throwing it into > > > production. > > That's kind of the problem - if people are doing testing and end > > up finding problems coming back in the stable kernel that's the > > sort of thing that encourages them to not just take stable en > > masse as we say they should. Part of the deal with stable is > > that it is conservative, people can trust it to be a low risk > > update. That's not happening now as far as I'm aware but it does > > worry me that it might happen. > Right, and the rate at which AUTOSEL commits are reverted is lower than > commits that are actually tagged for stable. If AUTOSEL commits on their > own were being reverted left and right I'd agree we need to tone it > down, but I don't see it happening now. I'm not sure how many people will actually report problems they experience upstream rather than just fixing things locally and just moving on. The more code is the more likely it is that one of the users will report things. > > > If we were to start avoiding driver updates, it would act as an > > > incentive for people not to upgrade their kernel. > > I'm not sure I follow the logic here? > The way I see it, the lower your "effective delta" is between to > kernels, the easier it is to move forward. For example, if I have a > product that runs on 4.19 and uses all our core kernel code + 10 > drivers, and I know that those drivers had most of the fixes backported > to my LTS tree, I'd feel much more confident going to 5.4 knowning that > I already have most of the patches that come with 5.4. I see, that's definitely a new one to me. The concerns people usually have about upgrading are more around the core kernel changing performance characteristics or something in a way that disrupts important workloads. I'm not quite sure I follow the logic there TBH, it seems to be discounting new development rather too much - even if the drivers have been very static there's all the integration with the rest of the kernel to think about. > For me it's a matter of how one would budget a move from a kernel X LTS > to kernel Y LTS, and I think that as that budget requirement grows it's > actually harder to actually do it (and convince management), acting as a > negative incentive to stay with whatever works now. If the drivers are static enough to only be getting bug fixes surely the rest of the kernel is a massively more substantial concern? > I have a different experience with this. I'd like to think that we're a > bigger customer and this process wasn't working too well for us. My > thinking was that if it's broken for us I can only imagine how bad it is > for the smaller customers. ... > This is where our story is different, which might explain my experience > being different: we usually require vendors to upstream everything, and > so they do. This means we don't have much of a out-of-tree patch > stacks/fixes from the vendor directly, and we expect to pick up patches > via the regular stable process, and that didn't happen all too well so > far. That sounds like they didn't pick up on the bit about getting things through LTS. This sounds like a pretty unusual request for a vendor to be getting, it doesn't 100% surprise me that it might take a few goes for them to understand what you're looking for, or that you're having a worse time than most users. For enterprise type stuff AFAICT people are expecting people to get their stable versions from distros rather than raw LTS. [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 488 bytes --] ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [Ksummit-discuss] [MAINTAINERS SUMMIT] stable kernel process automation and improvement 2019-07-08 12:37 ` Mark Brown @ 2019-07-08 14:05 ` Guenter Roeck 2019-07-08 14:33 ` Takashi Iwai 2019-07-08 14:50 ` Mark Brown 2019-07-08 18:01 ` Sasha Levin 1 sibling, 2 replies; 27+ messages in thread From: Guenter Roeck @ 2019-07-08 14:05 UTC (permalink / raw) To: Mark Brown, Sasha Levin; +Cc: ksummit-discuss On 7/8/19 5:37 AM, Mark Brown wrote: > On Mon, Jul 08, 2019 at 07:02:08AM -0400, Sasha Levin wrote: >> On Sat, Jul 06, 2019 at 01:32:14AM +0100, Mark Brown wrote: > >>> I'm not saying leave it alone, it's more a question of how >>> aggressive we are about picking up things we think might be >>> relevant fixes but haven't had some sort of domain specific >>> analysis of. Testing is a good way to mitigate the potential >>> risks here. > >> I agree, and for various subsystems and drivers where the maintainers >> volunteer their domain specific expertise to send backports to stable, I >> have "blacklisted" it from AUTOSEL since indeed it's a much better >> option. > > Hrm, it's definitely getting a bunch of stuff for my subsystems > where I do tag things for stable... > >>>> This came up in the last MS, and the agreement there was that we expect >>>> stable kernel users to test their workloads before throwing it into >>>> production. > >>> That's kind of the problem - if people are doing testing and end >>> up finding problems coming back in the stable kernel that's the >>> sort of thing that encourages them to not just take stable en >>> masse as we say they should. Part of the deal with stable is >>> that it is conservative, people can trust it to be a low risk >>> update. That's not happening now as far as I'm aware but it does >>> worry me that it might happen. > >> Right, and the rate at which AUTOSEL commits are reverted is lower than >> commits that are actually tagged for stable. If AUTOSEL commits on their >> own were being reverted left and right I'd agree we need to tone it >> down, but I don't see it happening now. > > I'm not sure how many people will actually report problems they > experience upstream rather than just fixing things locally and > just moving on. The more code is the more likely it is that one > of the users will report things. > I for my part will most definitely report any such problems, since each regression in stable releases is used as argument against merging stable releases (even if the regression rate is negligible), and I am very interested in getting that regression rate as close to zero as possible. Reporting each and every regression is an essential part of that. Guenter >>>> If we were to start avoiding driver updates, it would act as an >>>> incentive for people not to upgrade their kernel. > >>> I'm not sure I follow the logic here? > >> The way I see it, the lower your "effective delta" is between to >> kernels, the easier it is to move forward. For example, if I have a >> product that runs on 4.19 and uses all our core kernel code + 10 >> drivers, and I know that those drivers had most of the fixes backported >> to my LTS tree, I'd feel much more confident going to 5.4 knowning that >> I already have most of the patches that come with 5.4. > > I see, that's definitely a new one to me. The concerns people > usually have about upgrading are more around the core kernel > changing performance characteristics or something in a way that > disrupts important workloads. I'm not quite sure I follow the > logic there TBH, it seems to be discounting new development > rather too much - even if the drivers have been very static > there's all the integration with the rest of the kernel to think > about. > >> For me it's a matter of how one would budget a move from a kernel X LTS >> to kernel Y LTS, and I think that as that budget requirement grows it's >> actually harder to actually do it (and convince management), acting as a >> negative incentive to stay with whatever works now. > > If the drivers are static enough to only be getting bug fixes > surely the rest of the kernel is a massively more substantial > concern? > >> I have a different experience with this. I'd like to think that we're a >> bigger customer and this process wasn't working too well for us. My >> thinking was that if it's broken for us I can only imagine how bad it is >> for the smaller customers. > > ... > >> This is where our story is different, which might explain my experience >> being different: we usually require vendors to upstream everything, and >> so they do. This means we don't have much of a out-of-tree patch >> stacks/fixes from the vendor directly, and we expect to pick up patches >> via the regular stable process, and that didn't happen all too well so >> far. > > That sounds like they didn't pick up on the bit about getting > things through LTS. This sounds like a pretty unusual request > for a vendor to be getting, it doesn't 100% surprise me that > it might take a few goes for them to understand what you're > looking for, or that you're having a worse time than most users. > For enterprise type stuff AFAICT people are expecting people to > get their stable versions from distros rather than raw LTS. > > > _______________________________________________ > Ksummit-discuss mailing list > Ksummit-discuss@lists.linuxfoundation.org > https://lists.linuxfoundation.org/mailman/listinfo/ksummit-discuss > ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [Ksummit-discuss] [MAINTAINERS SUMMIT] stable kernel process automation and improvement 2019-07-08 14:05 ` Guenter Roeck @ 2019-07-08 14:33 ` Takashi Iwai 2019-07-08 15:10 ` Greg KH 2019-07-08 14:50 ` Mark Brown 1 sibling, 1 reply; 27+ messages in thread From: Takashi Iwai @ 2019-07-08 14:33 UTC (permalink / raw) To: Guenter Roeck; +Cc: ksummit-discuss On Mon, 08 Jul 2019 16:05:44 +0200, Guenter Roeck wrote: > > On 7/8/19 5:37 AM, Mark Brown wrote: > > On Mon, Jul 08, 2019 at 07:02:08AM -0400, Sasha Levin wrote: > >> On Sat, Jul 06, 2019 at 01:32:14AM +0100, Mark Brown wrote: > > > >>> I'm not saying leave it alone, it's more a question of how > >>> aggressive we are about picking up things we think might be > >>> relevant fixes but haven't had some sort of domain specific > >>> analysis of. Testing is a good way to mitigate the potential > >>> risks here. > > > >> I agree, and for various subsystems and drivers where the maintainers > >> volunteer their domain specific expertise to send backports to stable, I > >> have "blacklisted" it from AUTOSEL since indeed it's a much better > >> option. > > > > Hrm, it's definitely getting a bunch of stuff for my subsystems > > where I do tag things for stable... > > > >>>> This came up in the last MS, and the agreement there was that we expect > >>>> stable kernel users to test their workloads before throwing it into > >>>> production. > > > >>> That's kind of the problem - if people are doing testing and end > >>> up finding problems coming back in the stable kernel that's the > >>> sort of thing that encourages them to not just take stable en > >>> masse as we say they should. Part of the deal with stable is > >>> that it is conservative, people can trust it to be a low risk > >>> update. That's not happening now as far as I'm aware but it does > >>> worry me that it might happen. > > > >> Right, and the rate at which AUTOSEL commits are reverted is lower than > >> commits that are actually tagged for stable. If AUTOSEL commits on their > >> own were being reverted left and right I'd agree we need to tone it > >> down, but I don't see it happening now. > > > > I'm not sure how many people will actually report problems they > > experience upstream rather than just fixing things locally and > > just moving on. The more code is the more likely it is that one > > of the users will report things. > > > > I for my part will most definitely report any such problems, since each > regression in stable releases is used as argument against merging > stable releases (even if the regression rate is negligible), and I am > very interested in getting that regression rate as close to zero as > possible. Reporting each and every regression is an essential part > of that. BTW, regarding regression: currently we have no central regression tracking. This is another big missing piece, and a thing to be discussed in KS, IMO. thanks, Takashi ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [Ksummit-discuss] [MAINTAINERS SUMMIT] stable kernel process automation and improvement 2019-07-08 14:33 ` Takashi Iwai @ 2019-07-08 15:10 ` Greg KH 2019-07-08 15:18 ` Takashi Iwai ` (3 more replies) 0 siblings, 4 replies; 27+ messages in thread From: Greg KH @ 2019-07-08 15:10 UTC (permalink / raw) To: Takashi Iwai; +Cc: ksummit-discuss On Mon, Jul 08, 2019 at 04:33:28PM +0200, Takashi Iwai wrote: > On Mon, 08 Jul 2019 16:05:44 +0200, > Guenter Roeck wrote: > > > > On 7/8/19 5:37 AM, Mark Brown wrote: > > > On Mon, Jul 08, 2019 at 07:02:08AM -0400, Sasha Levin wrote: > > >> On Sat, Jul 06, 2019 at 01:32:14AM +0100, Mark Brown wrote: > > > > > >>> I'm not saying leave it alone, it's more a question of how > > >>> aggressive we are about picking up things we think might be > > >>> relevant fixes but haven't had some sort of domain specific > > >>> analysis of. Testing is a good way to mitigate the potential > > >>> risks here. > > > > > >> I agree, and for various subsystems and drivers where the maintainers > > >> volunteer their domain specific expertise to send backports to stable, I > > >> have "blacklisted" it from AUTOSEL since indeed it's a much better > > >> option. > > > > > > Hrm, it's definitely getting a bunch of stuff for my subsystems > > > where I do tag things for stable... > > > > > >>>> This came up in the last MS, and the agreement there was that we expect > > >>>> stable kernel users to test their workloads before throwing it into > > >>>> production. > > > > > >>> That's kind of the problem - if people are doing testing and end > > >>> up finding problems coming back in the stable kernel that's the > > >>> sort of thing that encourages them to not just take stable en > > >>> masse as we say they should. Part of the deal with stable is > > >>> that it is conservative, people can trust it to be a low risk > > >>> update. That's not happening now as far as I'm aware but it does > > >>> worry me that it might happen. > > > > > >> Right, and the rate at which AUTOSEL commits are reverted is lower than > > >> commits that are actually tagged for stable. If AUTOSEL commits on their > > >> own were being reverted left and right I'd agree we need to tone it > > >> down, but I don't see it happening now. > > > > > > I'm not sure how many people will actually report problems they > > > experience upstream rather than just fixing things locally and > > > just moving on. The more code is the more likely it is that one > > > of the users will report things. > > > > > > > I for my part will most definitely report any such problems, since each > > regression in stable releases is used as argument against merging > > stable releases (even if the regression rate is negligible), and I am > > very interested in getting that regression rate as close to zero as > > possible. Reporting each and every regression is an essential part > > of that. > > BTW, regarding regression: currently we have no central regression > tracking. This is another big missing piece, and a thing to be > discussed in KS, IMO. Well, I think the conversation will go just like it has in the past for this issue: "We need to have someone track regressions!" "X said they would do it but they need to be paid, any company willing to sponsor this?" {crickets} We know we need this, we have at least one talented and capable person to do the work, but no company is willing to step up and fund it :( It's like where we were 5 years ago with testing, everyone knew there was a problem, but no one was willing to do anything about it. That time I convinced some LF member companies to start doing work within their companies toward this, but that really doesn't solve this type of problem as being "distributed" isn't the issue here... thanks, greg k-h ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [Ksummit-discuss] [MAINTAINERS SUMMIT] stable kernel process automation and improvement 2019-07-08 15:10 ` Greg KH @ 2019-07-08 15:18 ` Takashi Iwai 2019-07-08 18:08 ` Sasha Levin ` (2 subsequent siblings) 3 siblings, 0 replies; 27+ messages in thread From: Takashi Iwai @ 2019-07-08 15:18 UTC (permalink / raw) To: Greg KH; +Cc: ksummit-discuss On Mon, 08 Jul 2019 17:10:40 +0200, Greg KH wrote: > > On Mon, Jul 08, 2019 at 04:33:28PM +0200, Takashi Iwai wrote: > > On Mon, 08 Jul 2019 16:05:44 +0200, > > Guenter Roeck wrote: > > > > > > On 7/8/19 5:37 AM, Mark Brown wrote: > > > > On Mon, Jul 08, 2019 at 07:02:08AM -0400, Sasha Levin wrote: > > > >> On Sat, Jul 06, 2019 at 01:32:14AM +0100, Mark Brown wrote: > > > > > > > >>> I'm not saying leave it alone, it's more a question of how > > > >>> aggressive we are about picking up things we think might be > > > >>> relevant fixes but haven't had some sort of domain specific > > > >>> analysis of. Testing is a good way to mitigate the potential > > > >>> risks here. > > > > > > > >> I agree, and for various subsystems and drivers where the maintainers > > > >> volunteer their domain specific expertise to send backports to stable, I > > > >> have "blacklisted" it from AUTOSEL since indeed it's a much better > > > >> option. > > > > > > > > Hrm, it's definitely getting a bunch of stuff for my subsystems > > > > where I do tag things for stable... > > > > > > > >>>> This came up in the last MS, and the agreement there was that we expect > > > >>>> stable kernel users to test their workloads before throwing it into > > > >>>> production. > > > > > > > >>> That's kind of the problem - if people are doing testing and end > > > >>> up finding problems coming back in the stable kernel that's the > > > >>> sort of thing that encourages them to not just take stable en > > > >>> masse as we say they should. Part of the deal with stable is > > > >>> that it is conservative, people can trust it to be a low risk > > > >>> update. That's not happening now as far as I'm aware but it does > > > >>> worry me that it might happen. > > > > > > > >> Right, and the rate at which AUTOSEL commits are reverted is lower than > > > >> commits that are actually tagged for stable. If AUTOSEL commits on their > > > >> own were being reverted left and right I'd agree we need to tone it > > > >> down, but I don't see it happening now. > > > > > > > > I'm not sure how many people will actually report problems they > > > > experience upstream rather than just fixing things locally and > > > > just moving on. The more code is the more likely it is that one > > > > of the users will report things. > > > > > > > > > > I for my part will most definitely report any such problems, since each > > > regression in stable releases is used as argument against merging > > > stable releases (even if the regression rate is negligible), and I am > > > very interested in getting that regression rate as close to zero as > > > possible. Reporting each and every regression is an essential part > > > of that. > > > > BTW, regarding regression: currently we have no central regression > > tracking. This is another big missing piece, and a thing to be > > discussed in KS, IMO. > > Well, I think the conversation will go just like it has in the past for > this issue: > "We need to have someone track regressions!" > "X said they would do it but they need to be paid, any company > willing to sponsor this?" > {crickets} > > We know we need this, we have at least one talented and capable person > to do the work, but no company is willing to step up and fund it :( Yeah, it's a sad deja vu... > It's like where we were 5 years ago with testing, everyone knew there > was a problem, but no one was willing to do anything about it. That > time I convinced some LF member companies to start doing work within > their companies toward this, but that really doesn't solve this type of > problem as being "distributed" isn't the issue here... The past attempts and their failing patterns look like a SPOF, it's been always a load to a single person, who eventually gave up maintaining. A more automated and distributed work would help in this regard, I hope sincerely. thanks, Takashi ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [Ksummit-discuss] [MAINTAINERS SUMMIT] stable kernel process automation and improvement 2019-07-08 15:10 ` Greg KH 2019-07-08 15:18 ` Takashi Iwai @ 2019-07-08 18:08 ` Sasha Levin 2019-07-08 21:31 ` Jiri Kosina 2019-07-09 15:21 ` Laura Abbott 3 siblings, 0 replies; 27+ messages in thread From: Sasha Levin @ 2019-07-08 18:08 UTC (permalink / raw) To: Greg KH; +Cc: ksummit-discuss On Mon, Jul 08, 2019 at 05:10:40PM +0200, Greg KH wrote: >Well, I think the conversation will go just like it has in the past for >this issue: > "We need to have someone track regressions!" > "X said they would do it but they need to be paid, any company > willing to sponsor this?" > {crickets} > >We know we need this, we have at least one talented and capable person >to do the work, but no company is willing to step up and fund it :( Maybe I am not clear on the role of the LF here, but why can't we get the LF to self-fund a regression tracking project for the kernel? Getting funding for something like this from companies is difficult. It's hard to sell the value of something like this to managers even though to us it's obviously *critical* (see the KernelCI case for example), and even if a certain company secure funding, LF's method of spinning up projects and trying to get them funded individually just doesn't work. -- Thanks, Sasha ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [Ksummit-discuss] [MAINTAINERS SUMMIT] stable kernel process automation and improvement 2019-07-08 15:10 ` Greg KH 2019-07-08 15:18 ` Takashi Iwai 2019-07-08 18:08 ` Sasha Levin @ 2019-07-08 21:31 ` Jiri Kosina 2019-07-09 15:44 ` Rafael J. Wysocki 2019-07-09 15:21 ` Laura Abbott 3 siblings, 1 reply; 27+ messages in thread From: Jiri Kosina @ 2019-07-08 21:31 UTC (permalink / raw) To: Greg KH; +Cc: ksummit-discuss On Mon, 8 Jul 2019, Greg KH wrote: > Well, I think the conversation will go just like it has in the past for > this issue: > "We need to have someone track regressions!" > "X said they would do it but they need to be paid, any company > willing to sponsor this?" > {crickets} SUSE has actually been funding this for quite some time (back when Rafael was doing it), but it's really tricky. We of course realize it's very important long-term activity, from which everybody profits. At the same time, you need somebody who *really deeply* understands everything inside and around the kernel development, otherwise you get more harm and chaos than added value out of the whole excercise. And if you have such a person (like we had Rafael), it's unlikely that person would want to do that work forever, and the funding company is also losing brainpower in other, more development-related areas (like PM in Rafael's case) at the same time. So it's not as simple as "hey, you, company making money on linux, go pay someone to do this". If I remember correctly (Rafael for sure would remember better), there were some attempts to have the regression tracking made by someone much more juniorish, but that person got of course immediately overwhelmed. Thanks, -- Jiri Kosina SUSE Labs ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [Ksummit-discuss] [MAINTAINERS SUMMIT] stable kernel process automation and improvement 2019-07-08 21:31 ` Jiri Kosina @ 2019-07-09 15:44 ` Rafael J. Wysocki 2019-07-09 21:05 ` Takashi Iwai 0 siblings, 1 reply; 27+ messages in thread From: Rafael J. Wysocki @ 2019-07-09 15:44 UTC (permalink / raw) To: ksummit-discuss On Monday, July 8, 2019 11:31:45 PM CEST Jiri Kosina wrote: > On Mon, 8 Jul 2019, Greg KH wrote: > > > Well, I think the conversation will go just like it has in the past for > > this issue: > > "We need to have someone track regressions!" > > "X said they would do it but they need to be paid, any company > > willing to sponsor this?" > > {crickets} > > SUSE has actually been funding this for quite some time (back when Rafael > was doing it), but it's really tricky. > > We of course realize it's very important long-term activity, from which > everybody profits. > > At the same time, you need somebody who *really deeply* understands > everything inside and around the kernel development, otherwise you get > more harm and chaos than added value out of the whole excercise. > > And if you have such a person (like we had Rafael), it's unlikely that > person would want to do that work forever, and the funding company is also > losing brainpower in other, more development-related areas (like PM in > Rafael's case) at the same time. > > So it's not as simple as "hey, you, company making money on linux, go pay > someone to do this". > > If I remember correctly (Rafael for sure would remember better), there > were some attempts to have the regression tracking made by someone much > more juniorish, but that person got of course immediately overwhelmed. There were such attempts and yes, the people dropped the ball eventually. Honestly, I don't agree with the idea that one person can practically track regression on the whole kernel basis today, because there is too many potential sources of information to follow. You'd need to track all of the mailing lists used for development, bug tracking systems in many places and so on. When I was tracking regressions, it was more or less sufficient to follow the LKML, and that was hard enough already at that time, but it is not sufficient any more (and even the LKML itself has become much more of a fire hose since then). The tracking of regressions, to be effective, would need to scale at least in the same way as the development process does IMO. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [Ksummit-discuss] [MAINTAINERS SUMMIT] stable kernel process automation and improvement 2019-07-09 15:44 ` Rafael J. Wysocki @ 2019-07-09 21:05 ` Takashi Iwai 0 siblings, 0 replies; 27+ messages in thread From: Takashi Iwai @ 2019-07-09 21:05 UTC (permalink / raw) To: Rafael J. Wysocki; +Cc: ksummit-discuss On Tue, 09 Jul 2019 17:44:13 +0200, Rafael J. Wysocki wrote: > > On Monday, July 8, 2019 11:31:45 PM CEST Jiri Kosina wrote: > > On Mon, 8 Jul 2019, Greg KH wrote: > > > > > Well, I think the conversation will go just like it has in the past for > > > this issue: > > > "We need to have someone track regressions!" > > > "X said they would do it but they need to be paid, any company > > > willing to sponsor this?" > > > {crickets} > > > > SUSE has actually been funding this for quite some time (back when Rafael > > was doing it), but it's really tricky. > > > > We of course realize it's very important long-term activity, from which > > everybody profits. > > > > At the same time, you need somebody who *really deeply* understands > > everything inside and around the kernel development, otherwise you get > > more harm and chaos than added value out of the whole excercise. > > > > And if you have such a person (like we had Rafael), it's unlikely that > > person would want to do that work forever, and the funding company is also > > losing brainpower in other, more development-related areas (like PM in > > Rafael's case) at the same time. > > > > So it's not as simple as "hey, you, company making money on linux, go pay > > someone to do this". > > > > If I remember correctly (Rafael for sure would remember better), there > > were some attempts to have the regression tracking made by someone much > > more juniorish, but that person got of course immediately overwhelmed. > > There were such attempts and yes, the people dropped the ball eventually. > > Honestly, I don't agree with the idea that one person can practically track > regression on the whole kernel basis today, because there is too many potential > sources of information to follow. You'd need to track all of the mailing lists used for > development, bug tracking systems in many places and so on. > > When I was tracking regressions, it was more or less sufficient to follow the LKML, > and that was hard enough already at that time, but it is not sufficient any more (and > even the LKML itself has become much more of a fire hose since then). > > The tracking of regressions, to be effective, would need to scale at least in the > same way as the development process does IMO. Agreed. And, I believe the key is to establish the standard way to report a regression from each subsystem maintainer side. That is, instead of a "regression manager" gathering the regression reports alone by him/herself, we let each maintainer reporting the regression more easily to a central place. And there we simply gather and provide the link to each regression report on a dashboard. Of course, this would need educations to each maintainer and developer, but it should be more scalable and sustainable than a top-down model. thanks, Takashi ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [Ksummit-discuss] [MAINTAINERS SUMMIT] stable kernel process automation and improvement 2019-07-08 15:10 ` Greg KH ` (2 preceding siblings ...) 2019-07-08 21:31 ` Jiri Kosina @ 2019-07-09 15:21 ` Laura Abbott 3 siblings, 0 replies; 27+ messages in thread From: Laura Abbott @ 2019-07-09 15:21 UTC (permalink / raw) To: Greg KH, Takashi Iwai; +Cc: ksummit-discuss On 7/8/19 11:10 AM, Greg KH wrote: > On Mon, Jul 08, 2019 at 04:33:28PM +0200, Takashi Iwai wrote: >> On Mon, 08 Jul 2019 16:05:44 +0200, >> Guenter Roeck wrote: >>> >>> On 7/8/19 5:37 AM, Mark Brown wrote: >>>> On Mon, Jul 08, 2019 at 07:02:08AM -0400, Sasha Levin wrote: >>>>> On Sat, Jul 06, 2019 at 01:32:14AM +0100, Mark Brown wrote: >>>> >>>>>> I'm not saying leave it alone, it's more a question of how >>>>>> aggressive we are about picking up things we think might be >>>>>> relevant fixes but haven't had some sort of domain specific >>>>>> analysis of. Testing is a good way to mitigate the potential >>>>>> risks here. >>>> >>>>> I agree, and for various subsystems and drivers where the maintainers >>>>> volunteer their domain specific expertise to send backports to stable, I >>>>> have "blacklisted" it from AUTOSEL since indeed it's a much better >>>>> option. >>>> >>>> Hrm, it's definitely getting a bunch of stuff for my subsystems >>>> where I do tag things for stable... >>>> >>>>>>> This came up in the last MS, and the agreement there was that we expect >>>>>>> stable kernel users to test their workloads before throwing it into >>>>>>> production. >>>> >>>>>> That's kind of the problem - if people are doing testing and end >>>>>> up finding problems coming back in the stable kernel that's the >>>>>> sort of thing that encourages them to not just take stable en >>>>>> masse as we say they should. Part of the deal with stable is >>>>>> that it is conservative, people can trust it to be a low risk >>>>>> update. That's not happening now as far as I'm aware but it does >>>>>> worry me that it might happen. >>>> >>>>> Right, and the rate at which AUTOSEL commits are reverted is lower than >>>>> commits that are actually tagged for stable. If AUTOSEL commits on their >>>>> own were being reverted left and right I'd agree we need to tone it >>>>> down, but I don't see it happening now. >>>> >>>> I'm not sure how many people will actually report problems they >>>> experience upstream rather than just fixing things locally and >>>> just moving on. The more code is the more likely it is that one >>>> of the users will report things. >>>> >>> >>> I for my part will most definitely report any such problems, since each >>> regression in stable releases is used as argument against merging >>> stable releases (even if the regression rate is negligible), and I am >>> very interested in getting that regression rate as close to zero as >>> possible. Reporting each and every regression is an essential part >>> of that. >> >> BTW, regarding regression: currently we have no central regression >> tracking. This is another big missing piece, and a thing to be >> discussed in KS, IMO. > > Well, I think the conversation will go just like it has in the past for > this issue: > "We need to have someone track regressions!" > "X said they would do it but they need to be paid, any company > willing to sponsor this?" > {crickets} > > We know we need this, we have at least one talented and capable person > to do the work, but no company is willing to step up and fund it :( > > It's like where we were 5 years ago with testing, everyone knew there > was a problem, but no one was willing to do anything about it. That > time I convinced some LF member companies to start doing work within > their companies toward this, but that really doesn't solve this type of > problem as being "distributed" isn't the issue here... > > thanks, > > greg k-h There's two parts here: a centralized place to track bugs and regressions and person to help manage those. While having a person to manage everything would be good, getting the central tracking going without relying on a single person is important. Thanks, Laura ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [Ksummit-discuss] [MAINTAINERS SUMMIT] stable kernel process automation and improvement 2019-07-08 14:05 ` Guenter Roeck 2019-07-08 14:33 ` Takashi Iwai @ 2019-07-08 14:50 ` Mark Brown 2019-07-08 15:06 ` Greg KH 1 sibling, 1 reply; 27+ messages in thread From: Mark Brown @ 2019-07-08 14:50 UTC (permalink / raw) To: Guenter Roeck; +Cc: ksummit-discuss [-- Attachment #1: Type: text/plain, Size: 858 bytes --] On Mon, Jul 08, 2019 at 07:05:44AM -0700, Guenter Roeck wrote: > On 7/8/19 5:37 AM, Mark Brown wrote: > > I'm not sure how many people will actually report problems they > > experience upstream rather than just fixing things locally and > > just moving on. The more code is the more likely it is that one > > of the users will report things. > I for my part will most definitely report any such problems, since each > regression in stable releases is used as argument against merging > stable releases (even if the regression rate is negligible), and I am > very interested in getting that regression rate as close to zero as > possible. Reporting each and every regression is an essential part > of that. Me too - but I'm pretty sure for example most of the product teams I've worked with at consumer electronics companies would never even consider it. [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 488 bytes --] ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [Ksummit-discuss] [MAINTAINERS SUMMIT] stable kernel process automation and improvement 2019-07-08 14:50 ` Mark Brown @ 2019-07-08 15:06 ` Greg KH 2019-07-08 15:27 ` Mark Brown 0 siblings, 1 reply; 27+ messages in thread From: Greg KH @ 2019-07-08 15:06 UTC (permalink / raw) To: Mark Brown; +Cc: ksummit-discuss On Mon, Jul 08, 2019 at 03:50:45PM +0100, Mark Brown wrote: > On Mon, Jul 08, 2019 at 07:05:44AM -0700, Guenter Roeck wrote: > > On 7/8/19 5:37 AM, Mark Brown wrote: > > > > I'm not sure how many people will actually report problems they > > > experience upstream rather than just fixing things locally and > > > just moving on. The more code is the more likely it is that one > > > of the users will report things. > > > I for my part will most definitely report any such problems, since each > > regression in stable releases is used as argument against merging > > stable releases (even if the regression rate is negligible), and I am > > very interested in getting that regression rate as close to zero as > > possible. Reporting each and every regression is an essential part > > of that. > > Me too - but I'm pretty sure for example most of the product > teams I've worked with at consumer electronics companies would > never even consider it. Sweet, want me to come into those teams and give a presentation like I did a few months ago for one major company entitled "all the ways your kernel is insecure and trivial to break"? I'll be glad to do so :) thanks, greg k-h ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [Ksummit-discuss] [MAINTAINERS SUMMIT] stable kernel process automation and improvement 2019-07-08 15:06 ` Greg KH @ 2019-07-08 15:27 ` Mark Brown 0 siblings, 0 replies; 27+ messages in thread From: Mark Brown @ 2019-07-08 15:27 UTC (permalink / raw) To: Greg KH; +Cc: ksummit-discuss [-- Attachment #1: Type: text/plain, Size: 1276 bytes --] On Mon, Jul 08, 2019 at 05:06:41PM +0200, Greg KH wrote: > On Mon, Jul 08, 2019 at 03:50:45PM +0100, Mark Brown wrote: > > On Mon, Jul 08, 2019 at 07:05:44AM -0700, Guenter Roeck wrote: > > > I for my part will most definitely report any such problems, since each > > > regression in stable releases is used as argument against merging > > > stable releases (even if the regression rate is negligible), and I am > > > very interested in getting that regression rate as close to zero as > > > possible. Reporting each and every regression is an essential part > > > of that. > > Me too - but I'm pretty sure for example most of the product > > teams I've worked with at consumer electronics companies would > > never even consider it. > Sweet, want me to come into those teams and give a presentation like I > did a few months ago for one major company entitled "all the ways your > kernel is insecure and trivial to break"? Go wild! Note that this isn't a case of people not taking updates, it's often a combination of a general confidentiality mindset and the fact that if you're taking updates from multiple sources (eg, LTS and one or more chip vendors) as well as making your own changes it can be more trouble than it's worth to figure out where to report anything. [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 488 bytes --] ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [Ksummit-discuss] [MAINTAINERS SUMMIT] stable kernel process automation and improvement 2019-07-08 12:37 ` Mark Brown 2019-07-08 14:05 ` Guenter Roeck @ 2019-07-08 18:01 ` Sasha Levin 1 sibling, 0 replies; 27+ messages in thread From: Sasha Levin @ 2019-07-08 18:01 UTC (permalink / raw) To: Mark Brown; +Cc: ksummit-discuss On Mon, Jul 08, 2019 at 01:37:33PM +0100, Mark Brown wrote: >On Mon, Jul 08, 2019 at 07:02:08AM -0400, Sasha Levin wrote: >> On Sat, Jul 06, 2019 at 01:32:14AM +0100, Mark Brown wrote: > >> > I'm not saying leave it alone, it's more a question of how >> > aggressive we are about picking up things we think might be >> > relevant fixes but haven't had some sort of domain specific >> > analysis of. Testing is a good way to mitigate the potential >> > risks here. > >> I agree, and for various subsystems and drivers where the maintainers >> volunteer their domain specific expertise to send backports to stable, I >> have "blacklisted" it from AUTOSEL since indeed it's a much better >> option. > >Hrm, it's definitely getting a bunch of stuff for my subsystems >where I do tag things for stable... You still need to explicitly ask me to blacklist it, but I'm more than happy to if you feel the AUTOSEL process doesn't add value. Some maintainers choose to keep AUTOSEL but just respond with "NAK" on patches they don't want in. >> > > This came up in the last MS, and the agreement there was that we expect >> > > stable kernel users to test their workloads before throwing it into >> > > production. > >> > That's kind of the problem - if people are doing testing and end >> > up finding problems coming back in the stable kernel that's the >> > sort of thing that encourages them to not just take stable en >> > masse as we say they should. Part of the deal with stable is >> > that it is conservative, people can trust it to be a low risk >> > update. That's not happening now as far as I'm aware but it does >> > worry me that it might happen. > >> Right, and the rate at which AUTOSEL commits are reverted is lower than >> commits that are actually tagged for stable. If AUTOSEL commits on their >> own were being reverted left and right I'd agree we need to tone it >> down, but I don't see it happening now. > >I'm not sure how many people will actually report problems they >experience upstream rather than just fixing things locally and >just moving on. The more code is the more likely it is that one >of the users will report things. > >> > > If we were to start avoiding driver updates, it would act as an >> > > incentive for people not to upgrade their kernel. > >> > I'm not sure I follow the logic here? > >> The way I see it, the lower your "effective delta" is between to >> kernels, the easier it is to move forward. For example, if I have a >> product that runs on 4.19 and uses all our core kernel code + 10 >> drivers, and I know that those drivers had most of the fixes backported >> to my LTS tree, I'd feel much more confident going to 5.4 knowning that >> I already have most of the patches that come with 5.4. > >I see, that's definitely a new one to me. The concerns people >usually have about upgrading are more around the core kernel >changing performance characteristics or something in a way that >disrupts important workloads. I'm not quite sure I follow the >logic there TBH, it seems to be discounting new development >rather too much - even if the drivers have been very static >there's all the integration with the rest of the kernel to think >about. My thinking is that we will need to address new core kernel developments either way, which is why I haven't mentioned them here. The variable cost here is how much effort will go into validating my hardware devices and the code that runs them. >> For me it's a matter of how one would budget a move from a kernel X LTS >> to kernel Y LTS, and I think that as that budget requirement grows it's >> actually harder to actually do it (and convince management), acting as a >> negative incentive to stay with whatever works now. > >If the drivers are static enough to only be getting bug fixes >surely the rest of the kernel is a massively more substantial >concern? They're not too static, and sadly them being less tested means I'm more worried about drivers than core kernel code. Sure, the core kernel is also a concern but as I've mentioned above, you will pay the price for re-testing core kernel stuff anyway. -- Thanks, Sasha ^ permalink raw reply [flat|nested] 27+ messages in thread
end of thread, other threads:[~2019-07-09 21:05 UTC | newest] Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2019-07-03 1:35 [Ksummit-discuss] [MAINTAINERS SUMMIT] stable kernel process automation and improvement Sasha Levin 2019-07-03 14:57 ` Laura Abbott 2019-07-05 13:54 ` Michael Ellerman 2019-07-05 14:13 ` Takashi Iwai 2019-07-05 16:17 ` Greg KH 2019-07-05 16:52 ` Sasha Levin 2019-07-05 16:41 ` Mark Brown 2019-07-05 20:12 ` Sasha Levin 2019-07-06 0:32 ` Mark Brown 2019-07-08 11:02 ` Sasha Levin 2019-07-08 11:35 ` Jiri Kosina 2019-07-08 12:34 ` Greg KH 2019-07-08 17:56 ` Sasha Levin 2019-07-08 12:37 ` Mark Brown 2019-07-08 14:05 ` Guenter Roeck 2019-07-08 14:33 ` Takashi Iwai 2019-07-08 15:10 ` Greg KH 2019-07-08 15:18 ` Takashi Iwai 2019-07-08 18:08 ` Sasha Levin 2019-07-08 21:31 ` Jiri Kosina 2019-07-09 15:44 ` Rafael J. Wysocki 2019-07-09 21:05 ` Takashi Iwai 2019-07-09 15:21 ` Laura Abbott 2019-07-08 14:50 ` Mark Brown 2019-07-08 15:06 ` Greg KH 2019-07-08 15:27 ` Mark Brown 2019-07-08 18:01 ` Sasha Levin
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox