ksummit.lists.linux.dev archive mirror
 help / color / mirror / Atom feed
From: Thorsten Leemhuis <linux@leemhuis.info>
To: "ksummit@lists.linux.dev" <ksummit@lists.linux.dev>
Subject: Re: [MAINTAINERS SUMMIT] [1/4] Create written down guidelines for handling regressions
Date: Thu, 12 Sep 2024 15:33:59 +0200	[thread overview]
Message-ID: <aea2022a-e2b4-4d38-95db-c0006e6a7146@leemhuis.info> (raw)
In-Reply-To: <e44af14b-1c5d-479c-8752-8f4d52a00c63@leemhuis.info>

On 13.06.24 10:26, Thorsten Leemhuis wrote:
> Different assumptions about the appropriate handling of regressions
> frequently lead to friction and time consuming discussions during my
> regression tracking and prodding work. That is frustrating, demotivating
> and exhausting for everyone involved and even brought us to situations
> like "then I'm stepping down as maintainer". To avoid things like this,
> I propose we try to pin down guidelines together and ideally make Linus
> bless them.
> 
> The "Expectations and best practices for fixing regressions" in
> Documentation/process/handling-regressions.rst (
> https://docs.kernel.org/process/handling-regressions.html#expectations-and-best-practices-for-fixing-regressions
> ) could be a start for such guidelines -- but I'm obviously biased here,
> as I wrote that text, so feel free to propose something new.
> 
> That text is based on generalized interpretations of statements and
> actions from Linus while keeping practical application and our workflows
> in mind -- including the maintenance of stable trees. I have no idea if
> I went too far somewhere: the submission of that text was addressed to
> Linus, but he did not react; otoh he merged it later after Greg ACKed it
> and it came to his doorstep through the docs tree.
> 
> But in the end it seems most people do not know about this text or do
> not take it for real. [...]

Lo! The discussion here rightfully exposed that the wording regarding
the stable tag was way to strong. Sorry for that, not sure how that
happened, that was not my intend. That and a few other aspects (some
from the discussions here) made me revisit the text regarding
"Expectations and best practices for fixing regressions". See below for
my current draft (the diff view is not really helpful, sorry). Note,
should be easy to add a week or two to any sections regarding the
timing aspects; guess that is best discussed on the summit.

As I said earlier, the text is based on generalized interpretations of
statements and actions from Linus with some interpolation. But in some
areas what I wrote might be not what Linus wants. To sort this out I'm
currently also preparing a few scenarios with related questions for the
maintainers summit audience (incl. Linus) that hopefully will help to
keep the discussion fruitful, targeted, and as short as possible. More
on that on Tuesday.

Ciao, Thorsten
---

Expectations and best practices for fixing regressions
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Try to quickly resolve regressions in mainline while applying reasonable care to
prevent additional problems. The appropriate balance depends on the situation;
most regressions should ideally be resolved through a fix or a revert by the
last Sunday within the next two or three weeks after the culprit was identified.

The rules of thumb below outline the appropriate procedure in more detail. The
overall goal is to prevent situations where a regression caused by a recent
change leaves users only three bad options: use a kernel with a regression that
impacts usage, switch to a different kernel series, or run an outdated and thus
potentially insecure kernel for more than two or three weeks. 

In general:

 * Prioritize work on providing, reviewing, and mainlining regression fixes over
   other upstream Linux kernel work, unless the latter concerns severe issues
   (e.g. acute security vulnerabilities, data loss, or bricked hardware).

 * Do not consider fixing regressions from the current development cycle as
   something that can wait till its end: the issue possibly prevents users or
   CI systems from testing, which might drive testers away and mask other bugs.

 * When developing a fix, apply the required care to avoid additional damage. Do
   so even when resolving a regression might take longer than outlined below --
   at least unless a revert could resolve it, as then you should opt for one.

 * Reviewers and maintainers likewise should apply the required care, but at the
   same time should try to route regression fixes quickly through the ranks.

On timing once the change causing the regression became known:

 * If the regression is severe, aim to mainline a fix within two or three work
   days and ideally before the next Sunday; do the same it its is bothering many
   users in general or most people in prevalent environments (say a widespread 
   hardware device, a popular Linux distribution, or a stable/longterm series).

 * Aim to mainline a fix by Sunday after the next, if the culprit made it into
   a kernel deemed for end users during the past three months -- either directly
   through a mainline release or through backports to stable or longterm series.
   If the culprit became known early during a week while being simple to resolve
   using a low-risk patch, try to mainline the fix within the same week instead.

 * For other regressions introduced during the past twelve months, aim to
   mainline a fix before the hindmost Sunday within the next three weeks. One or
   two weeks later are acceptable, if the regression is unlikely to bother more
   than a user or two or is something people can easily live with temporarily.

 * Try your best to mainline a fix before the current development cycle ends,
   unless the culprit was committed more than a year ago: then it is acceptable
   to queue a fix for the next merge window, which definitely should be done in
   case it bear bigger risks.

On patch flow to mainline:

 * Developers, when trying to reach the time periods mentioned above, remember
   to account for the time it will take to test, review, commit, and mainline
   fixes, ideally with them being in linux-next at least briefly. Hence, if
   fixes are urgent, make it obvious to ensure others handle them appropriately.

 * Reviewers, you are kindly asked to assist developers in reaching the time
   periods mentioned above by reviewing regression fixes in a timely manner.

 * Maintainers, you likewise are kindly asked to expedite the handling
   of regression fixes. Thus when beneficial evaluate if skipping linux-next
   might be an option. Also consider sending git pull requests more often than
   usual when appropriate. And try to avoid holding onto regression fixes over
   weekends -- especially when some are marked for backporting to stable series.

On procedure:

 * If a regression seems tangly, precarious or urgent, consider CCing Linus on
   discussions or patch review; do the same if the responsible maintainers
   suspected to be unavailable. 

 * For an urgent regression, consider asking Linus to pick up a fix straight
   from the mailing list: he is totally fine with that for uncontroversial
   fixes. Such requests should ideally come directly from maintainers or happen
   in accordance with them.

 * In case you are unsure if a fix is worth the risk applying just days before
   a new mainline release, send Linus a mail with the usual lists and developers
   in CC; in it, summarize the situation while asking to pick up the fix
   straight from the list. Linus then can make the call and when appropriate
   even postpone the release. Such requests again should ideally come directly
   from maintainers or happen in accordance with them.

On tagging in the patch description:

 * Include the tags Documentation/process/submitting-patches.rst mentions for
   regressions; this usually means a "Reported-by:" tag followed by "Link:" or
   "Closes:" tag pointing to the report as well as a "Fixes:" tag; if it's a
   regression a later change exposed, add a "Fixes:" tag for that one, too.

 * Did the culprit make it into a proper mainline release during the past twelve
   months? Or is it a recent mainline commit backported to stable or longterm
   releases in the past few weeks? Then you are kindly asked to ensure stable
   inclusion as described by Documentation/process/stable-kernel-rules.rst, e.g.
   by adding a "Cc: stable@vger.kernel.org" to the patch description. Note, a
   "Fixes:" tag alone does not guarantee a backport: the stable team sometimes
   silently drop such changes, for example when they do not apply cleanly.

Regarding stable and longterm kernels:

 * When receiving reports about regressions in recent stable or longterm kernel
   series, please consider evaluating at least briefly, if the issue might
   happen in current mainline as well -- and if that seems likely, take hold of
   the report. If in doubt, ask the reporter to check mainline.

 * You are free to leave handling regressions to the stable team, if the problem
   at no point in time occurred with mainline or was fixed there already.

 * Whenever you want to swiftly resolve a mainline regression that recently made
   it into a mainline, stable, or longterm release, fix it quickly in mainline;
   in urgent cases thus involve Linus to fast-track fixes (see above). That's
   required, as the stable team normally does neither revert nor fix any changes
   in their trees as long as those cause the same problem in mainline.

 * In case of urgent fixes for regression affecting stable or longterm kernels,
   you might want to ensure prompt backporting by dropping the stable team a
   note once the fix was mainlined; this is especially advisable during merge
   windows and shortly thereafter, as the fix otherwise might land at the end
   of a huge patch queue.

  reply	other threads:[~2024-09-12 13:34 UTC|newest]

Thread overview: 107+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-06-13  8:22 [MAINTAINERS SUMMIT] [0/4] Common scenario for four proposals regarding regressions Thorsten Leemhuis
2024-06-13  8:26 ` [MAINTAINERS SUMMIT] [1/4] Create written down guidelines for handling regressions Thorsten Leemhuis
2024-09-12 13:33   ` Thorsten Leemhuis [this message]
2024-06-13  8:32 ` [MAINTAINERS SUMMIT] [2/4] Ensure recent mainline regression are fixed in latest stable series Thorsten Leemhuis
2024-06-13 11:02   ` Johannes Berg
2024-06-13 11:21     ` Greg KH
2024-06-13 13:18       ` Sasha Levin
2024-06-13 11:17   ` Jiri Kosina
2024-06-13 11:28   ` Laurent Pinchart
2024-06-14  0:50     ` Steven Rostedt
2024-06-14 14:01   ` Mark Brown
2024-06-14 14:32     ` Rafael J. Wysocki
2024-06-13  8:34 ` [MAINTAINERS SUMMIT] [3/4] Elevate handling of regressions that made it to releases deemed for end users Thorsten Leemhuis
2024-06-13 11:34   ` Laurent Pinchart
2024-06-13 11:39     ` Jiri Kosina
2024-06-14 14:10       ` Mark Brown
2024-06-18 12:58         ` Thorsten Leemhuis
2024-06-19 20:25           ` Laurent Pinchart
2024-06-20 10:47             ` Thorsten Leemhuis
2024-06-13 15:56     ` Liam R. Howlett
2024-06-18 12:24     ` Thorsten Leemhuis
2024-06-20 13:20       ` Jani Nikula
2024-06-20 13:35         ` Thorsten Leemhuis
2024-06-20 14:16           ` Mark Brown
2024-06-21  6:47           ` Jiri Kosina
2024-06-21 10:19             ` Thorsten Leemhuis
2024-06-13  8:42 ` [MAINTAINERS SUMMIT] [4/4] Discuss how to better prevent backports of commits that turn out to cause regressions Thorsten Leemhuis
2024-06-13  9:59   ` Jan Kara
2024-06-13 10:18     ` Thorsten Leemhuis
2024-06-13 14:08     ` Konstantin Ryabitsev
2024-06-14  9:19       ` Lee Jones
2024-06-14  9:24         ` Lee Jones
2024-06-14 12:27         ` Konstantin Ryabitsev
2024-06-14 14:26           ` Konstantin Ryabitsev
2024-06-14 14:36             ` Lee Jones
2024-06-14 14:29       ` Michael Ellerman
2024-06-14 14:38         ` Konstantin Ryabitsev
2024-06-14 14:44           ` Rafael J. Wysocki
2024-06-14 15:08           ` Geert Uytterhoeven
2024-06-15 11:29             ` Michael Ellerman
2024-06-17 10:15             ` Jani Nikula
2024-06-17 12:42               ` Geert Uytterhoeven
2024-06-14 15:45           ` Mark Brown
2024-06-14 14:43         ` Mark Brown
2024-06-14 14:51           ` Konstantin Ryabitsev
2024-06-14 15:42             ` Mark Brown
2024-06-14 14:43         ` Steven Rostedt
2024-06-14 14:57           ` Laurent Pinchart
2024-06-16  1:13         ` Linus Torvalds
2024-06-16  3:28           ` Steven Rostedt
2024-06-16  4:59             ` Linus Torvalds
2024-06-16  8:22               ` Paolo Bonzini
2024-06-16  9:05               ` Geert Uytterhoeven
2024-06-16 15:07               ` Steven Rostedt
2024-06-17 13:48               ` Dan Carpenter
2024-06-17 15:23                 ` Dan Carpenter
2024-06-17 14:39               ` Konstantin Ryabitsev
2024-06-17 16:04                 ` Paul E. McKenney
2024-06-17 16:06                   ` Konstantin Ryabitsev
2024-06-17 16:14                     ` Paolo Bonzini
2024-06-17 16:18                       ` Konstantin Ryabitsev
2024-06-17 17:11                         ` Geert Uytterhoeven
2024-06-18 12:05                 ` Michael Ellerman
2024-06-16  7:26           ` Takashi Iwai
2024-06-16  8:10           ` Paolo Bonzini
2024-06-16 11:31             ` Laurent Pinchart
2024-06-16 11:39             ` Takashi Iwai
2024-06-16 16:40             ` Linus Torvalds
2024-06-16  8:31           ` Jiri Kosina
2024-06-16  8:54             ` Geert Uytterhoeven
2024-06-13 19:39     ` Dan Carpenter
2024-06-14  1:00       ` Steven Rostedt
2024-06-13 11:58   ` James Bottomley
2024-06-13 13:06     ` Sasha Levin
2024-06-13 13:56       ` James Bottomley
2024-06-13 14:02         ` Greg KH
2024-06-13 15:11           ` James Bottomley
2024-06-13 16:27             ` Greg KH
2024-06-14 18:47             ` Sasha Levin
2024-06-17 10:59               ` Vlastimil Babka
2024-06-13 18:08         ` Sasha Levin
2024-06-13 13:45     ` Greg KH
2024-06-13 13:40   ` Sasha Levin
2024-06-18 13:12     ` Thorsten Leemhuis
2024-06-13 14:28   ` Andrew Lunn
2024-06-13 18:14     ` Sasha Levin
2024-06-14 14:41       ` Jan Kara
2024-06-14 15:03         ` Rafael J. Wysocki
2024-06-14 17:46           ` Sasha Levin
2024-06-18 14:43 ` [MAINTAINERS SUMMIT] [0/4] Common scenario for four proposals regarding regressions James Bottomley
2024-06-18 15:50   ` Mark Brown
2024-06-20 10:32   ` Thorsten Leemhuis
2024-06-20 12:57     ` James Bottomley
2024-06-20 13:55       ` Mark Brown
2024-06-20 14:01         ` James Bottomley
2024-06-20 14:42           ` Mark Brown
2024-06-20 16:02             ` James Bottomley
2024-06-20 17:15               ` Mark Brown
2024-06-20 23:25               ` Sasha Levin
2024-06-21  6:33                 ` Thorsten Leemhuis
     [not found]               ` <20240625175131.672d14a4@rorschach.local.home>
2024-06-26  7:36                 ` Greg KH
2024-06-26 18:32                   ` Steven Rostedt
2024-06-26 19:05                     ` James Bottomley
2024-07-25 10:14                   ` Thorsten Leemhuis
2024-07-25 13:14                     ` Greg KH
2024-06-20 16:59       ` Thorsten Leemhuis
2024-06-20 23:18         ` Sasha Levin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aea2022a-e2b4-4d38-95db-c0006e6a7146@leemhuis.info \
    --to=linux@leemhuis.info \
    --cc=ksummit@lists.linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox