workflows.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Simple analytics for docs.kernel.org and patchwork, please?
@ 2024-02-23 16:31 Jakub Kicinski
  2024-02-23 17:14 ` Mauro Carvalho Chehab
                   ` (3 more replies)
  0 siblings, 4 replies; 9+ messages in thread
From: Jakub Kicinski @ 2024-02-23 16:31 UTC (permalink / raw)
  To: workflows, Konstantin Ryabitsev; +Cc: corbet, ast

Hi!

We have a few netdev-related bots with various simple status pages.
I hooked them up to analytics recently, here's the dash:
https://plausible.io/netdev.bots.linux.dev

Plausible was described here: https://lwn.net/Articles/822568/
it's supposedly-open, and privacy-focused, no cookies etc.

It's useful for me when deciding where to invest my time,
and to back up the efforts to my employer with some data.

Now, most of us agree that kernel docs leave something to be desired.
At the same time maintainers are repeatedly faced with people who post
code without reading the docs, which puts the time invested in writing
them into question. I can't help but think that providing some
analytics for docs.kernel.org traffic would be beneficial. 
I would use it.

Thoughts?

Does anyone think that even non-intrusive analytics are a no go?

Does anyone know better alternatives than Plausible?

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Simple analytics for docs.kernel.org and patchwork, please?
  2024-02-23 16:31 Simple analytics for docs.kernel.org and patchwork, please? Jakub Kicinski
@ 2024-02-23 17:14 ` Mauro Carvalho Chehab
  2024-02-23 17:49 ` Jonathan Corbet
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 9+ messages in thread
From: Mauro Carvalho Chehab @ 2024-02-23 17:14 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: workflows, Konstantin Ryabitsev, corbet, ast

Em Fri, 23 Feb 2024 08:31:54 -0800
Jakub Kicinski <kuba@kernel.org> escreveu:

> Hi!
> 
> We have a few netdev-related bots with various simple status pages.
> I hooked them up to analytics recently, here's the dash:
> https://plausible.io/netdev.bots.linux.dev
> 
> Plausible was described here: https://lwn.net/Articles/822568/
> it's supposedly-open, and privacy-focused, no cookies etc.
> 
> It's useful for me when deciding where to invest my time,
> and to back up the efforts to my employer with some data.
> 
> Now, most of us agree that kernel docs leave something to be desired.
> At the same time maintainers are repeatedly faced with people who post
> code without reading the docs, which puts the time invested in writing
> them into question. I can't help but think that providing some
> analytics for docs.kernel.org traffic would be beneficial. 
> I would use it.
> 
> Thoughts?
> 
> Does anyone think that even non-intrusive analytics are a no go?
> 
> Does anyone know better alternatives than Plausible?

I have a small hand-made script using Pandas/Seaborn to produce
some patchwork statistics as can be seen at:

	https://linuxtv.org/patchwork_stats.php

Feel free to use it as a basis to get some stats.

You may need to modify it to cover stats per project (on Linux media, there's
just two projects, and the second one with handful number of patches per year).
So, we didn't need to filter it per project.

Also, please notice that the second query doesn't use any index on Patwork 3.1. 

I ended manually creating an index to speed it up on mariadb with:

	CREATE INDEX idx_patchwork_patch_stateid_date on patchwork_patch(state_id, date);

I hope that helps.

Regards,
Mauro

---

#!/usr/bin/env python3
# SPDX-License-Identifier: GPL-2.0
# Copyright(c) Mauro Carvalho Chehab <mchehab@kernel.org>

from datetime import datetime, date, timedelta
from matplotlib.dates import DateFormatter
from matplotlib.pyplot import xlim
from pandas import read_sql
from seaborn import relplot, set_style,axes_style
from sqlalchemy import create_engine, text

DIR = './'

def log(msg):
    now = datetime.now().strftime("%d/%m/%Y %H:%M:%S")
    print(f'{now}: {msg}')

today = date.today()

# Consider yesterday as the final date
end_date = today - timedelta(days=1)

# Two complete years + this month
start_date = end_date.replace(year=today.year - 2).replace(month=today.month - 1)
start_date = start_date.replace(day=1)

interval = f'date >= "{start_date}" and date <= "{end_date}"'

log("Connecting to database")

engine = create_engine("mysql://patchwork:yaicCoqui@localhost/patchwork?charset=utf8mb4")

palette = "bright"
background = "#555555"

style = {
    'axes.facecolor':background,
    'grid.color':'white',
    'axes.edgecolor': 'orange',
    'axes.labelcolor': 'orange',
    'text.color': '#ffcc00',
    'xtick.color': 'white',
    'ytick.color': 'white',
    'patch.edgecolor': 'orange',
    'figure.facecolor':'black'
}


xformatter = DateFormatter("%Y-%m")

with engine.connect() as conn:
     # Total patches
    query = text(f'select DATE_FORMAT(date, "%Y-%m") AS date, count(*) AS patches from patchwork_patch WHERE {interval} group by DATE_FORMAT(date, "%Y-%m") ORDER BY YEAR(date), MONTH(DATE)')
    log(query)
    total = read_sql(query, con=conn, parse_dates=['date'])

    log("Creating total patches graph")

    set_style(style="darkgrid",rc=style)

    print({k: v for k, v in axes_style().items() if "color" in k})

    g = relplot(kind="line", marker='x', markers=True, data=total, x="date", y="patches")
    g.set_axis_labels("Date", "Number of patches", labelpad=10)
    g.set(title=f'Number of patches received per month between {start_date} and {end_date}')
    g.figure.set_size_inches(14, 6)
    print(g.ax)
    g.ax.margins(.05)
    g.ax.autoscale_view()
    g.ax.edgecolor="black"
    g.despine(trim=True, left=True, bottom=True)
    xlim(start_date, end_date)
    g.axes[0,0].xaxis.set_major_formatter(xformatter)

    g.savefig(DIR + 'patches_per_date.svg')

    # Patches per state
    query = text(f'select DATE_FORMAT(date, "%Y-%m") as date, st.name as State, count(*) as patches from patchwork.patchwork_patch AS p, patchwork_state as st where state_id = st.id and {interval} group by DATE_FORMAT(date, "%Y-%m"), st.id')
    log(query)
    per_state = read_sql(query, con=conn, parse_dates=['date'])

    log("Creating patches per state")

    per_state.set_index('date', inplace=True)

    g = relplot(kind="line", data=per_state, x="date", y="patches", hue="State", markers="State", marker="X", palette=palette)

    g.set_axis_labels("Date", "Number of patches", labelpad=10)
    g.set(title=f'Number of patches per state received per month between {start_date} and {end_date}')
    g.figure.set_size_inches(13.5, 6)
    g.ax.margins(.05)
    g.ax.autoscale_view()
    g.ax.edgecolor=False
    g.despine(trim=True, left=True, bottom=True)
    xlim(start_date, end_date)
    g.axes[0,0].xaxis.set_major_formatter(xformatter)

    g.add_legend(loc='upper left', bbox_to_anchor=(1.12, 0.5))
    g.savefig(DIR + 'patches_per_state.svg')

    log("Done.")

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Simple analytics for docs.kernel.org and patchwork, please?
  2024-02-23 16:31 Simple analytics for docs.kernel.org and patchwork, please? Jakub Kicinski
  2024-02-23 17:14 ` Mauro Carvalho Chehab
@ 2024-02-23 17:49 ` Jonathan Corbet
  2024-02-23 20:02   ` Jakub Kicinski
  2024-02-26 19:06 ` Jakub Kicinski
  2024-02-26 19:24 ` Konstantin Ryabitsev
  3 siblings, 1 reply; 9+ messages in thread
From: Jonathan Corbet @ 2024-02-23 17:49 UTC (permalink / raw)
  To: Jakub Kicinski, workflows, Konstantin Ryabitsev; +Cc: ast

Jakub Kicinski <kuba@kernel.org> writes:

> Does anyone think that even non-intrusive analytics are a no go?

What sorts of analytics are you looking for?  Simple logfile analysis
should be fairly uncontroversial and would tell you which documents are
most of interest to the AI bots^W^Wdevelopers.  Anything requiring, say,
javascript in the browser is likely to get blocked by the kinds of
people who might be interested in kernel docs.

We did an overview of relatively innocuous analytics packages a few
years ago:

  https://lwn.net/Articles/822568/

jon

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Simple analytics for docs.kernel.org and patchwork, please?
  2024-02-23 17:49 ` Jonathan Corbet
@ 2024-02-23 20:02   ` Jakub Kicinski
  0 siblings, 0 replies; 9+ messages in thread
From: Jakub Kicinski @ 2024-02-23 20:02 UTC (permalink / raw)
  To: Jonathan Corbet; +Cc: workflows, Konstantin Ryabitsev, ast

On Fri, 23 Feb 2024 10:49:35 -0700 Jonathan Corbet wrote:
> Jakub Kicinski <kuba@kernel.org> writes:
> 
> > Does anyone think that even non-intrusive analytics are a no go?  
> 
> What sorts of analytics are you looking for?  Simple logfile analysis
> should be fairly uncontroversial and would tell you which documents are
> most of interest to the AI bots^W^Wdevelopers. 

Yes, basic analysis of access.log would do. I think that's equivalent
to what Plausible does. More of a question of what existing solution we
can set up quickly, but have no preference on which method or tool we
end up using.

All we need is hit count for a subpage, with some basic dedup of
a single reader hitting refresh...

> Anything requiring, say, javascript in the browser is likely to get
> blocked by the kinds of people who might be interested in kernel docs.

Interesting. I spent 20min grepping the netdev's access.log.
This may be confirmation bias, but vast majority of the hits
are more or less thinly veiled bots. Unless we believe that
someone from an Android phone decided to visit "admin.php"
after landing on our page... (admin.php obviously doesn't exit)

I zeroed in on the following metric - users who came from patchwork
(clicked on CI results) over the last week. Plausible -> 17,
IP addresses in access log with the right refer -> 18.
The dates in logs may not match up exactly so the small delta is
expected.

After doing this exercise, I'd like to withdraw my previous statement
that "access.log analysis" is fine. Now I think it's far more likely
we'd miscount bots than that someone legit has blocked javascript...

> We did an overview of relatively innocuous analytics packages a few
> years ago:
> 
>   https://lwn.net/Articles/822568/

We need some analysis of how much of an email people actually read :)
Look at the second paragraph of my first email, where do you think 
I found Plausible if not LWN ;)

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Simple analytics for docs.kernel.org and patchwork, please?
  2024-02-23 16:31 Simple analytics for docs.kernel.org and patchwork, please? Jakub Kicinski
  2024-02-23 17:14 ` Mauro Carvalho Chehab
  2024-02-23 17:49 ` Jonathan Corbet
@ 2024-02-26 19:06 ` Jakub Kicinski
  2024-02-26 19:24 ` Konstantin Ryabitsev
  3 siblings, 0 replies; 9+ messages in thread
From: Jakub Kicinski @ 2024-02-26 19:06 UTC (permalink / raw)
  To: Konstantin Ryabitsev; +Cc: workflows, corbet, ast

On Fri, 23 Feb 2024 08:31:54 -0800 Jakub Kicinski wrote:
> Hi!
> 
> We have a few netdev-related bots with various simple status pages.
> I hooked them up to analytics recently, here's the dash:
> https://plausible.io/netdev.bots.linux.dev
> 
> Plausible was described here: https://lwn.net/Articles/822568/
> it's supposedly-open, and privacy-focused, no cookies etc.
> 
> It's useful for me when deciding where to invest my time,
> and to back up the efforts to my employer with some data.

Hi Konstantin, are you open to trying some analytics?
If yes I will go ask for approval to pay the bill.

FWIW it's not a lot of work, for netdev pages I add
the tracker with sed:
 sed -i 's@</title>$@</title><script defer data-domain="netdev.bots.linux.dev" src="https://plausible.io/js/script.js"></script>@' $file

If there's a concern that other maintainers don't want this
we can selectively sed just the networking pages?

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Simple analytics for docs.kernel.org and patchwork, please?
  2024-02-23 16:31 Simple analytics for docs.kernel.org and patchwork, please? Jakub Kicinski
                   ` (2 preceding siblings ...)
  2024-02-26 19:06 ` Jakub Kicinski
@ 2024-02-26 19:24 ` Konstantin Ryabitsev
  2024-02-26 19:43   ` Jakub Kicinski
  3 siblings, 1 reply; 9+ messages in thread
From: Konstantin Ryabitsev @ 2024-02-26 19:24 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: workflows, corbet, ast

On Fri, Feb 23, 2024 at 08:31:54AM -0800, Jakub Kicinski wrote:
> Hi!
> 
> We have a few netdev-related bots with various simple status pages.
> I hooked them up to analytics recently, here's the dash:
> https://plausible.io/netdev.bots.linux.dev
> 
> Plausible was described here: https://lwn.net/Articles/822568/
> it's supposedly-open, and privacy-focused, no cookies etc.
> 
> It's useful for me when deciding where to invest my time,
> and to back up the efforts to my employer with some data.
> 
> Now, most of us agree that kernel docs leave something to be desired.
> At the same time maintainers are repeatedly faced with people who post
> code without reading the docs, which puts the time invested in writing
> them into question. I can't help but think that providing some
> analytics for docs.kernel.org traffic would be beneficial. 
> I would use it.
> 
> Thoughts?
> 
> Does anyone think that even non-intrusive analytics are a no go?

In general, my previous experience enabling libravatar on git.kernel.org has
taught me that many very vocal people *really* don't like to have any kind of
statistics gathered about them. However, if it's just for docs.kernel.org,
then I don't think I have specific objections.

That said, I would need help turning this on -- if someone can pass me along a
Sphinx configuration option that I can enable during build time, then I'll be
happy to add it to our build jobs.

-K

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Simple analytics for docs.kernel.org and patchwork, please?
  2024-02-26 19:24 ` Konstantin Ryabitsev
@ 2024-02-26 19:43   ` Jakub Kicinski
  2024-02-26 19:58     ` Jonathan Corbet
  0 siblings, 1 reply; 9+ messages in thread
From: Jakub Kicinski @ 2024-02-26 19:43 UTC (permalink / raw)
  To: Konstantin Ryabitsev; +Cc: workflows, corbet, ast, linux-doc

On Mon, 26 Feb 2024 14:24:39 -0500 Konstantin Ryabitsev wrote:
> In general, my previous experience enabling libravatar on git.kernel.org has
> taught me that many very vocal people *really* don't like to have any kind of
> statistics gathered about them. However, if it's just for docs.kernel.org,
> then I don't think I have specific objections.
> 
> That said, I would need help turning this on -- if someone can pass me along a
> Sphinx configuration option that I can enable during build time, then I'll be
> happy to add it to our build jobs.

Excellent :)

Let me CC linux-doc in case someone can tell us how to hook things in.

Could you give me a ballpark number of page hits for docs.kernel.org?
500k page views a month should be enough? Plausible has different
pricing depending on number of views, I need to know how much money
to ask for.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Simple analytics for docs.kernel.org and patchwork, please?
  2024-02-26 19:43   ` Jakub Kicinski
@ 2024-02-26 19:58     ` Jonathan Corbet
  2024-02-26 22:52       ` Jakub Kicinski
  0 siblings, 1 reply; 9+ messages in thread
From: Jonathan Corbet @ 2024-02-26 19:58 UTC (permalink / raw)
  To: Jakub Kicinski, Konstantin Ryabitsev; +Cc: workflows, ast, linux-doc

Jakub Kicinski <kuba@kernel.org> writes:

> On Mon, 26 Feb 2024 14:24:39 -0500 Konstantin Ryabitsev wrote:
>> In general, my previous experience enabling libravatar on git.kernel.org has
>> taught me that many very vocal people *really* don't like to have any kind of
>> statistics gathered about them. However, if it's just for docs.kernel.org,
>> then I don't think I have specific objections.
>> 
>> That said, I would need help turning this on -- if someone can pass me along a
>> Sphinx configuration option that I can enable during build time, then I'll be
>> happy to add it to our build jobs.
>
> Excellent :)
>
> Let me CC linux-doc in case someone can tell us how to hook things in.

It's probably not just a configuration option.  I suspect that this will
need to be done either by editing the templates or with a little
extension.  Either could require adding this support to the kernel repo,
which might raise some eyebrows.

jon

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Simple analytics for docs.kernel.org and patchwork, please?
  2024-02-26 19:58     ` Jonathan Corbet
@ 2024-02-26 22:52       ` Jakub Kicinski
  0 siblings, 0 replies; 9+ messages in thread
From: Jakub Kicinski @ 2024-02-26 22:52 UTC (permalink / raw)
  To: Jonathan Corbet; +Cc: Konstantin Ryabitsev, workflows, ast, linux-doc

On Mon, 26 Feb 2024 12:58:43 -0700 Jonathan Corbet wrote:
> Jakub Kicinski <kuba@kernel.org> writes:
> > On Mon, 26 Feb 2024 14:24:39 -0500 Konstantin Ryabitsev wrote:  
> >> In general, my previous experience enabling libravatar on git.kernel.org has
> >> taught me that many very vocal people *really* don't like to have any kind of
> >> statistics gathered about them. However, if it's just for docs.kernel.org,
> >> then I don't think I have specific objections.
> >> 
> >> That said, I would need help turning this on -- if someone can pass me along a
> >> Sphinx configuration option that I can enable during build time, then I'll be
> >> happy to add it to our build jobs.  
> >
> > Excellent :)
> >
> > Let me CC linux-doc in case someone can tell us how to hook things in.  
> 
> It's probably not just a configuration option.  I suspect that this will
> need to be done either by editing the templates or with a little
> extension.  Either could require adding this support to the kernel repo,
> which might raise some eyebrows.

FWIW I tried poking around to insert "script_files" into conf.py,
because the RTD template does seem to have:

      {%- for scriptfile in script_files %}
        {{ js_tag(scriptfile) }}
      {%- endfor %}

But I only managed to add a pure "include" with just the 'src'
attribute on the <script> node, like:

	<script src="../../cabbage.js"></script>

We also need to set 'defer' and "data-domain="docs.kernel.org"'.

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2024-02-26 22:52 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-02-23 16:31 Simple analytics for docs.kernel.org and patchwork, please? Jakub Kicinski
2024-02-23 17:14 ` Mauro Carvalho Chehab
2024-02-23 17:49 ` Jonathan Corbet
2024-02-23 20:02   ` Jakub Kicinski
2024-02-26 19:06 ` Jakub Kicinski
2024-02-26 19:24 ` Konstantin Ryabitsev
2024-02-26 19:43   ` Jakub Kicinski
2024-02-26 19:58     ` Jonathan Corbet
2024-02-26 22:52       ` Jakub Kicinski

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox