workflows.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Mauro Carvalho Chehab <mchehab@kernel.org>
To: Jakub Kicinski <kuba@kernel.org>
Cc: workflows@vger.kernel.org,
	Konstantin Ryabitsev <konstantin@linuxfoundation.org>,
	corbet@lwn.net, ast@kernel.org
Subject: Re: Simple analytics for docs.kernel.org and patchwork, please?
Date: Fri, 23 Feb 2024 18:14:03 +0100	[thread overview]
Message-ID: <20240223175523.5a428f28@coco.lan> (raw)
In-Reply-To: <20240223083154.4fbee63c@kernel.org>

Em Fri, 23 Feb 2024 08:31:54 -0800
Jakub Kicinski <kuba@kernel.org> escreveu:

> Hi!
> 
> We have a few netdev-related bots with various simple status pages.
> I hooked them up to analytics recently, here's the dash:
> https://plausible.io/netdev.bots.linux.dev
> 
> Plausible was described here: https://lwn.net/Articles/822568/
> it's supposedly-open, and privacy-focused, no cookies etc.
> 
> It's useful for me when deciding where to invest my time,
> and to back up the efforts to my employer with some data.
> 
> Now, most of us agree that kernel docs leave something to be desired.
> At the same time maintainers are repeatedly faced with people who post
> code without reading the docs, which puts the time invested in writing
> them into question. I can't help but think that providing some
> analytics for docs.kernel.org traffic would be beneficial. 
> I would use it.
> 
> Thoughts?
> 
> Does anyone think that even non-intrusive analytics are a no go?
> 
> Does anyone know better alternatives than Plausible?

I have a small hand-made script using Pandas/Seaborn to produce
some patchwork statistics as can be seen at:

	https://linuxtv.org/patchwork_stats.php

Feel free to use it as a basis to get some stats.

You may need to modify it to cover stats per project (on Linux media, there's
just two projects, and the second one with handful number of patches per year).
So, we didn't need to filter it per project.

Also, please notice that the second query doesn't use any index on Patwork 3.1. 

I ended manually creating an index to speed it up on mariadb with:

	CREATE INDEX idx_patchwork_patch_stateid_date on patchwork_patch(state_id, date);

I hope that helps.

Regards,
Mauro

---

#!/usr/bin/env python3
# SPDX-License-Identifier: GPL-2.0
# Copyright(c) Mauro Carvalho Chehab <mchehab@kernel.org>

from datetime import datetime, date, timedelta
from matplotlib.dates import DateFormatter
from matplotlib.pyplot import xlim
from pandas import read_sql
from seaborn import relplot, set_style,axes_style
from sqlalchemy import create_engine, text

DIR = './'

def log(msg):
    now = datetime.now().strftime("%d/%m/%Y %H:%M:%S")
    print(f'{now}: {msg}')

today = date.today()

# Consider yesterday as the final date
end_date = today - timedelta(days=1)

# Two complete years + this month
start_date = end_date.replace(year=today.year - 2).replace(month=today.month - 1)
start_date = start_date.replace(day=1)

interval = f'date >= "{start_date}" and date <= "{end_date}"'

log("Connecting to database")

engine = create_engine("mysql://patchwork:yaicCoqui@localhost/patchwork?charset=utf8mb4")

palette = "bright"
background = "#555555"

style = {
    'axes.facecolor':background,
    'grid.color':'white',
    'axes.edgecolor': 'orange',
    'axes.labelcolor': 'orange',
    'text.color': '#ffcc00',
    'xtick.color': 'white',
    'ytick.color': 'white',
    'patch.edgecolor': 'orange',
    'figure.facecolor':'black'
}


xformatter = DateFormatter("%Y-%m")

with engine.connect() as conn:
     # Total patches
    query = text(f'select DATE_FORMAT(date, "%Y-%m") AS date, count(*) AS patches from patchwork_patch WHERE {interval} group by DATE_FORMAT(date, "%Y-%m") ORDER BY YEAR(date), MONTH(DATE)')
    log(query)
    total = read_sql(query, con=conn, parse_dates=['date'])

    log("Creating total patches graph")

    set_style(style="darkgrid",rc=style)

    print({k: v for k, v in axes_style().items() if "color" in k})

    g = relplot(kind="line", marker='x', markers=True, data=total, x="date", y="patches")
    g.set_axis_labels("Date", "Number of patches", labelpad=10)
    g.set(title=f'Number of patches received per month between {start_date} and {end_date}')
    g.figure.set_size_inches(14, 6)
    print(g.ax)
    g.ax.margins(.05)
    g.ax.autoscale_view()
    g.ax.edgecolor="black"
    g.despine(trim=True, left=True, bottom=True)
    xlim(start_date, end_date)
    g.axes[0,0].xaxis.set_major_formatter(xformatter)

    g.savefig(DIR + 'patches_per_date.svg')

    # Patches per state
    query = text(f'select DATE_FORMAT(date, "%Y-%m") as date, st.name as State, count(*) as patches from patchwork.patchwork_patch AS p, patchwork_state as st where state_id = st.id and {interval} group by DATE_FORMAT(date, "%Y-%m"), st.id')
    log(query)
    per_state = read_sql(query, con=conn, parse_dates=['date'])

    log("Creating patches per state")

    per_state.set_index('date', inplace=True)

    g = relplot(kind="line", data=per_state, x="date", y="patches", hue="State", markers="State", marker="X", palette=palette)

    g.set_axis_labels("Date", "Number of patches", labelpad=10)
    g.set(title=f'Number of patches per state received per month between {start_date} and {end_date}')
    g.figure.set_size_inches(13.5, 6)
    g.ax.margins(.05)
    g.ax.autoscale_view()
    g.ax.edgecolor=False
    g.despine(trim=True, left=True, bottom=True)
    xlim(start_date, end_date)
    g.axes[0,0].xaxis.set_major_formatter(xformatter)

    g.add_legend(loc='upper left', bbox_to_anchor=(1.12, 0.5))
    g.savefig(DIR + 'patches_per_state.svg')

    log("Done.")

  reply	other threads:[~2024-02-23 17:14 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-02-23 16:31 Jakub Kicinski
2024-02-23 17:14 ` Mauro Carvalho Chehab [this message]
2024-02-23 17:49 ` Jonathan Corbet
2024-02-23 20:02   ` Jakub Kicinski
2024-02-26 19:06 ` Jakub Kicinski
2024-02-26 19:24 ` Konstantin Ryabitsev
2024-02-26 19:43   ` Jakub Kicinski
2024-02-26 19:58     ` Jonathan Corbet
2024-02-26 22:52       ` Jakub Kicinski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240223175523.5a428f28@coco.lan \
    --to=mchehab@kernel.org \
    --cc=ast@kernel.org \
    --cc=corbet@lwn.net \
    --cc=konstantin@linuxfoundation.org \
    --cc=kuba@kernel.org \
    --cc=workflows@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox