From: James Bottomley <James.Bottomley@HansenPartnership.com>
To: Thorsten Leemhuis <linux@leemhuis.info>,
"ksummit@lists.linux.dev" <ksummit@lists.linux.dev>
Subject: Re: [MAINTAINERS SUMMIT] [0/4] Common scenario for four proposals regarding regressions
Date: Thu, 20 Jun 2024 08:57:29 -0400 [thread overview]
Message-ID: <ead819d8bc59bd188bf4c07b3604a4aa5a194d8d.camel@HansenPartnership.com> (raw)
In-Reply-To: <c4db6faa-89ac-4f1c-ac87-1db8f91ac480@leemhuis.info>
On Thu, 2024-06-20 at 12:32 +0200, Thorsten Leemhuis wrote:
> On 18.06.24 16:43, James Bottomley wrote:
> > On Thu, 2024-06-13 at 10:22 +0200, Thorsten Leemhuis wrote:
> > > Lo! I prepared four proposals for the maintainers summit
> > > regarding regressions I'll send in reply to this mail. They are
> > > somewhat related and address different aspects of one scenario I
> > > see frequently in different variations; so instead of repeating
> > > that scenario in slightly modified form in each of the proposals,
> > > I'm putting it out here once:
> >
> > I think you're missing a piece here about how we actually find
> > regressions. A lot, it is true, come from test suites running on
> > the mainline.
>
> Sure.
>
> > However, for obscure drivers and even some more complex
> > dependencies, the regression sometimes isn't discovered until it
> > gets into the hands of the wider pool of testers, often via stable.
> >
> > This is important, because it emphasizes that zero regressions in
> > stable is impossible (and thus preventing backporting patches that
> > cause regressions is also impossible) if stable is the vehicle by
> > which some regressions are discovered.
>
> Of course "Zero regressions in stable is impossible" as we are
> dealing with software. ;) And of course even with delayed backport
> for non-urgent fixes some problems would make it through.
>
> But right now users testing mainline sometimes hardly have a chance
> to test and report problems with mainline in time to prevent a
> backport. Take Linux 6.7.2 (released 2024-01-25 23:58 UTC) with its
> 640 changes for example, where users had only 4 days to do so, as
> almost all of its changes had been merged for 6.8-rc1 (2024-01-21
> 22:23 UTC). FWIW: 200 of those changes were committed to some
> subsystem git tree during January, 363 during December, 70 during
> November, and 7 during October.
I did make this point here:
https://lore.kernel.org/all/7794a2b09ae4fa73ac35fdaec4858145a665efea.camel@HansenPartnership.com/
That merge window fixes should be delayed. Not because I think a
longer soak in main would allow us to find many more bugs, simply
because it was causing reports in the merge window that weren't handled
because people had other things to do. The reply was that they're
already doing it and when I looked, they actually started doing it for
the 6.9 merge window (so your 6.7 example is probably out of date).
> So if those 440 fixes could wait some time to be mainlined and were
> not important enough to get into 6.7 (2024-01-07 20:29 UTC) in the
> first place, why the rush backporting them to 6.7.y so quickly after
> the merge window?
>
> All that leads to the related question "How many of those changes
> maybe should have gone into 6.7?". And maybe even "Should we somehow
> try to motivate more people to try -next?".
Actually, if we got more people to try mainline we could perhaps find
more bugs. Testing -next is problematic because its instability makes
things like bisection and update to next release difficult.
> But those are different problems.
> And the situation regarding the first already got somewhat better
> from what I can see -- among others afaics due to me prodding people
> when the queue fixes for recent regression for the -next merge
> window.
Yes, that's why I was asking for stats on 6.9 and 6.10 where this delay
policy was apparently in place.
> > Plus it also means that a backport
> > delay or cadence would actually delay discovery of some regressions
> > because the patches that cause them won't be seen by the configs
> > that run into them until they get put into stable.
>
> And why is that a problem?
Because a regression we haven't found yet is still a regression. If
all we cared about was minimizing the regression stats, we could simply
not look for any of them. But we do care about this, so we need to
support all our mechanisms for finding them and the point I was making
is that one such mechanism is the early backports to stable. There is
probably a sweet spot backport delay for regressions we do eventually
find in main, but for regressions that others only find in stable (and
would never have been found in main however log we delayed) arbitrary
delays merely increases the time to finding them.
Perhaps one thing we should track with regressions is time to discovery
and also ask about ones in stable if they could have been found in
mainline? That would give us more data for tuning the backport delay.
> > [...]
> >
> > The other thing I think would help is better tooling and advice to
> > help reporters find regressions in stable. What we do a lot
> > upstream is ask if they can reproduce it in mainline. However, not
> > everyone is equipped to test out mainline kernels, so we could do
> > with helping them bisect it in stable
>
> FWIW Documentation/admin-guide/verify-bugs-and-bisect-regressions.rst
> /
> https://docs.kernel.org/admin-guide/verify-bugs-and-bisect-regressions.html
> covers this: users that notice a regression in a stable tree will
> bisect that tree. But before...
Some do, but realistically the best others can do is this bug was in
X.Y.Z but not in X.Y.Z-1 because they can't build their own kernels.
> > (note this can be time dependent: older stable
> > trees more naturally give rise to the question "has this been fixed
> > upstream" making mainline testing more of an imperative).
>
> ...it does so, but tells users to try mainline for two reasons:
> * It might be fixed there already.
> * When Greg receives a regression report for stable he'll usually ask
> "is mainline also affected" anyway to figure out if this is something
> he or somebody else has to look into. And some of the mainline
> developer will ask this, too.
Again not saying that's wrong, just saying we must accept that some
bugs will only be found in stable and thus we could do with improving
our tooling to help stable users pinpoint the backport that caused
them.
James
next prev parent reply other threads:[~2024-06-20 12:57 UTC|newest]
Thread overview: 107+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-06-13 8:22 Thorsten Leemhuis
2024-06-13 8:26 ` [MAINTAINERS SUMMIT] [1/4] Create written down guidelines for handling regressions Thorsten Leemhuis
2024-09-12 13:33 ` Thorsten Leemhuis
2024-06-13 8:32 ` [MAINTAINERS SUMMIT] [2/4] Ensure recent mainline regression are fixed in latest stable series Thorsten Leemhuis
2024-06-13 11:02 ` Johannes Berg
2024-06-13 11:21 ` Greg KH
2024-06-13 13:18 ` Sasha Levin
2024-06-13 11:17 ` Jiri Kosina
2024-06-13 11:28 ` Laurent Pinchart
2024-06-14 0:50 ` Steven Rostedt
2024-06-14 14:01 ` Mark Brown
2024-06-14 14:32 ` Rafael J. Wysocki
2024-06-13 8:34 ` [MAINTAINERS SUMMIT] [3/4] Elevate handling of regressions that made it to releases deemed for end users Thorsten Leemhuis
2024-06-13 11:34 ` Laurent Pinchart
2024-06-13 11:39 ` Jiri Kosina
2024-06-14 14:10 ` Mark Brown
2024-06-18 12:58 ` Thorsten Leemhuis
2024-06-19 20:25 ` Laurent Pinchart
2024-06-20 10:47 ` Thorsten Leemhuis
2024-06-13 15:56 ` Liam R. Howlett
2024-06-18 12:24 ` Thorsten Leemhuis
2024-06-20 13:20 ` Jani Nikula
2024-06-20 13:35 ` Thorsten Leemhuis
2024-06-20 14:16 ` Mark Brown
2024-06-21 6:47 ` Jiri Kosina
2024-06-21 10:19 ` Thorsten Leemhuis
2024-06-13 8:42 ` [MAINTAINERS SUMMIT] [4/4] Discuss how to better prevent backports of commits that turn out to cause regressions Thorsten Leemhuis
2024-06-13 9:59 ` Jan Kara
2024-06-13 10:18 ` Thorsten Leemhuis
2024-06-13 14:08 ` Konstantin Ryabitsev
2024-06-14 9:19 ` Lee Jones
2024-06-14 9:24 ` Lee Jones
2024-06-14 12:27 ` Konstantin Ryabitsev
2024-06-14 14:26 ` Konstantin Ryabitsev
2024-06-14 14:36 ` Lee Jones
2024-06-14 14:29 ` Michael Ellerman
2024-06-14 14:38 ` Konstantin Ryabitsev
2024-06-14 14:44 ` Rafael J. Wysocki
2024-06-14 15:08 ` Geert Uytterhoeven
2024-06-15 11:29 ` Michael Ellerman
2024-06-17 10:15 ` Jani Nikula
2024-06-17 12:42 ` Geert Uytterhoeven
2024-06-14 15:45 ` Mark Brown
2024-06-14 14:43 ` Mark Brown
2024-06-14 14:51 ` Konstantin Ryabitsev
2024-06-14 15:42 ` Mark Brown
2024-06-14 14:43 ` Steven Rostedt
2024-06-14 14:57 ` Laurent Pinchart
2024-06-16 1:13 ` Linus Torvalds
2024-06-16 3:28 ` Steven Rostedt
2024-06-16 4:59 ` Linus Torvalds
2024-06-16 8:22 ` Paolo Bonzini
2024-06-16 9:05 ` Geert Uytterhoeven
2024-06-16 15:07 ` Steven Rostedt
2024-06-17 13:48 ` Dan Carpenter
2024-06-17 15:23 ` Dan Carpenter
2024-06-17 14:39 ` Konstantin Ryabitsev
2024-06-17 16:04 ` Paul E. McKenney
2024-06-17 16:06 ` Konstantin Ryabitsev
2024-06-17 16:14 ` Paolo Bonzini
2024-06-17 16:18 ` Konstantin Ryabitsev
2024-06-17 17:11 ` Geert Uytterhoeven
2024-06-18 12:05 ` Michael Ellerman
2024-06-16 7:26 ` Takashi Iwai
2024-06-16 8:10 ` Paolo Bonzini
2024-06-16 11:31 ` Laurent Pinchart
2024-06-16 11:39 ` Takashi Iwai
2024-06-16 16:40 ` Linus Torvalds
2024-06-16 8:31 ` Jiri Kosina
2024-06-16 8:54 ` Geert Uytterhoeven
2024-06-13 19:39 ` Dan Carpenter
2024-06-14 1:00 ` Steven Rostedt
2024-06-13 11:58 ` James Bottomley
2024-06-13 13:06 ` Sasha Levin
2024-06-13 13:56 ` James Bottomley
2024-06-13 14:02 ` Greg KH
2024-06-13 15:11 ` James Bottomley
2024-06-13 16:27 ` Greg KH
2024-06-14 18:47 ` Sasha Levin
2024-06-17 10:59 ` Vlastimil Babka
2024-06-13 18:08 ` Sasha Levin
2024-06-13 13:45 ` Greg KH
2024-06-13 13:40 ` Sasha Levin
2024-06-18 13:12 ` Thorsten Leemhuis
2024-06-13 14:28 ` Andrew Lunn
2024-06-13 18:14 ` Sasha Levin
2024-06-14 14:41 ` Jan Kara
2024-06-14 15:03 ` Rafael J. Wysocki
2024-06-14 17:46 ` Sasha Levin
2024-06-18 14:43 ` [MAINTAINERS SUMMIT] [0/4] Common scenario for four proposals regarding regressions James Bottomley
2024-06-18 15:50 ` Mark Brown
2024-06-20 10:32 ` Thorsten Leemhuis
2024-06-20 12:57 ` James Bottomley [this message]
2024-06-20 13:55 ` Mark Brown
2024-06-20 14:01 ` James Bottomley
2024-06-20 14:42 ` Mark Brown
2024-06-20 16:02 ` James Bottomley
2024-06-20 17:15 ` Mark Brown
2024-06-20 23:25 ` Sasha Levin
2024-06-21 6:33 ` Thorsten Leemhuis
[not found] ` <20240625175131.672d14a4@rorschach.local.home>
2024-06-26 7:36 ` Greg KH
2024-06-26 18:32 ` Steven Rostedt
2024-06-26 19:05 ` James Bottomley
2024-07-25 10:14 ` Thorsten Leemhuis
2024-07-25 13:14 ` Greg KH
2024-06-20 16:59 ` Thorsten Leemhuis
2024-06-20 23:18 ` Sasha Levin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ead819d8bc59bd188bf4c07b3604a4aa5a194d8d.camel@HansenPartnership.com \
--to=james.bottomley@hansenpartnership.com \
--cc=ksummit@lists.linux.dev \
--cc=linux@leemhuis.info \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox