Hi,

On 05/15/2014 07:11 PM, Josh Boyer wrote:
> On Thu, May 15, 2014 at 1:01 PM, Levente Kurusa <levex@linux.com> wrote:
>> Hi,
>>
>> On Tue, May 13, 2014 at 09:14:48PM -0400, Josh Boyer wrote:
>>> On Tue, May 13, 2014 at 12:07 PM, Theodore Ts'o <tytso@mit.edu> wrote:
>>>> I'll note this discussion has started mutating to a more general "how
>>>> do we get more useful bug reports in front of developers", which I
>>>> think is a good thing.
>>>
>>> Yes, agreed.
>>>
>>>> However, I'm still not sure how useful it would be to have a tech
>>>> topic (or a core topic) dedicated to the matter, because we've had
>>>> discussions about and at the end of the day, what's probably really
>>>> necessary is to have someone, or a small team, dedicated all or most
>>>> of their time to:
>>>>
>>>> a) improving kerneloops.org
>>>> b) finding interesting patterns in the bulk reported data, and then
>>>> forwarding that on to developers
>>>> c) finding ways of automating (b)
>>>
>>> I think that's not really the full answer.  Having that setup would
>>> certainly be beneficial, but it ignores the time delay required to
>>> both 1) actually get kernel releases to large numbers of users, and 2)
>>> find said interesting patterns.
>>>
>>> The number of users testing the latest kernel release is certainly
>>> more today than ever, but the bulk of users are still using distro
>>> kernels.  Even with Fedora, Arch, and other distros rebasing rather
>>> quickly, we are still looking at that kernel release hitting the user
>>> base after the merge window is closed for the next version already
>>> (N+1).  That means the upstream kernel developers are off developing
>>> for the release after (N+2).
>>>
>>> So if a large number of users starts hitting bugs in version N, and
>>> the upstream developers are already working on changes for N+2,
>>> waiting for interesting patterns to develop is actually increasing the
>>> "cost" to the developers to go back and look at code they did 2
>>> releases ago.  The typical response is "does this recreate on Linus'
>>> latest tree or some subsystem git tree", which is N+1 in this case.
>>> It's a fair request from a developer's perspective, but it's not as
>>> simple for an end user or distro.  Often times they'll try N+1 and hit
>>> a different bug on their system, making it even more confusing to try
>>> and work the original report to conclusion.  (That appears to be Dave
>>> Jones' life in a nutshell with trinity reports at the moment, so it's
>>> not just Aunt Tillie either.)
>>>
>>> Often times the bugs are still in N+1 as well, so it's certainly
>>> helpful to report either way.  The areas where we tend to see this
>>> problem aren't in core MM or VFS code, but more in things like
>>> backlights, GPU drivers, wireless drivers, etc.  These aren't trivial
>>> areas to debug or git bisect (which is a nightmare to work and end
>>> user through).  They are also the same areas where we depend on
>>> end-user testing and reporting because of the huge amount of
>>> variability in the hardware itself.  E.g. a change to fix something in
>>> i915 on one machine/chipset seems to inevitably break a different
>>> machine/chipset.
>>
>> One thing I would add is if a user actually reports a bug on Bugzilla
>> often times they are able to do a git bisect. What I have observed is
>> that less tech-savvy people don't even bother with trying to report
>> the bug nor they would try to do anything to fix it if the bug isn't
>> that fatal. Of course, ABRT and the like has improved the number of
>> quality bug reports, but there still exists a number of fatal bugs
>> that with which we remain less informed.
> 
> Working daily in Fedora bugzilla would lead me to politely disagree
> with you.  Or perhaps note that ABRT is leading to more bug reports,
> but less technically inclined users that find bisect confusing.  Even
> some of the more technically inclined users tend to ask for pre-built
> RPMs for bisect purposes, which isn't particularly easy.

I guess that more critical issues are reported by more people and chances
are there is at least one who is capable enough to do a bisect. Less
critical issues might go un-bisected, but I guess that is what people call
'collateral damage'.

> 
>> Naturally a question arises. How could get technically not-so-capable
>> people to report bugs? I guess QR codes have become so mainstream
>> nowadays, that they can provide a solution. What I see nowadays is
> 
> Maybe.  I've yet to see people actually use QR codes for anything, but
> the technology certainly exists.

Well, here in Hungary, these QR codes started to in quite a few places.
Today, I saw it on a restaurant's menu, and it forwarded me to a link
where I could see the particular food from any viewpoint. It also
appeared on the back of vehicles, on TV commercials and a lot of other
places.

> 
>> that when people see a QR code, they almost automatically try to scan
>> it. Not to mention when they have nothing else to do as with a kernel
>> crash. :-)
> 
> We live in amazingly different worlds.

It could be that I am surrounded by tech-savvy people, but it's certainly
getting mainstream.

> 
> Anyway, my side discussion was not intended to discourage the QR code
> progress.  By all means, those interested should pursue that because
> anything helps.

Right.

> 
>>> I realize QR codes, kerneloops.org, and things of that nature aren't
>>> going to solve this problem.  That's kind of why I'd like to see it
>>> discussed more broadly, and not assume that it can be automated away.
>>> I'm just concerned that the rate of development today is outpacing our
>>> ability to get the releases into user's hands and get valid and useful
>>> bug reports from them.
>>
>> Indeed, by the time the people's favourite distro rebases, we are
>> already working on a new release. This mostly does not tend to be
>> a very bad delay with bleeding-edge distros like Arch and Fedora,
>> but what I am concerned of is what happens with the conservative
>> distros like Debian. AFAIK, the latest version of Debian is still
>> shipping the 3.2 stable kernel. If we start receiving reports from the
>> 3.2 kernel, which is N-13 at the moment, then chances are the reports
>> are less useless, since the subsystems have evolved so much since 3.2.
> 
> Right.  Debian is probably the "worst" case scenario here, because
> they use a very old kernel but aren't in the same position as the
> other users of old kernels, which are the big Enterprise vendors.  The
> EL kernels have lots of staff to deal with this problem, so I'm kind
> of excluding them from this aspect of the conversation, but Debian is
> certainly impacted.

... and this won't be fixed by QR codes or anything that reports bugs I
guess, since those reports might be only useful to the stable maintainers.
Other maintainers can not be made to remember their code in a way that allows
triaging bugs in version N-13 or so...


Thanks,
    Levente Kurusa