* Simple analytics for docs.kernel.org and patchwork, please?
@ 2024-02-23 16:31 Jakub Kicinski
2024-02-23 17:14 ` Mauro Carvalho Chehab
` (3 more replies)
0 siblings, 4 replies; 9+ messages in thread
From: Jakub Kicinski @ 2024-02-23 16:31 UTC (permalink / raw)
To: workflows, Konstantin Ryabitsev; +Cc: corbet, ast
Hi!
We have a few netdev-related bots with various simple status pages.
I hooked them up to analytics recently, here's the dash:
https://plausible.io/netdev.bots.linux.dev
Plausible was described here: https://lwn.net/Articles/822568/
it's supposedly-open, and privacy-focused, no cookies etc.
It's useful for me when deciding where to invest my time,
and to back up the efforts to my employer with some data.
Now, most of us agree that kernel docs leave something to be desired.
At the same time maintainers are repeatedly faced with people who post
code without reading the docs, which puts the time invested in writing
them into question. I can't help but think that providing some
analytics for docs.kernel.org traffic would be beneficial.
I would use it.
Thoughts?
Does anyone think that even non-intrusive analytics are a no go?
Does anyone know better alternatives than Plausible?
^ permalink raw reply [flat|nested] 9+ messages in thread* Re: Simple analytics for docs.kernel.org and patchwork, please? 2024-02-23 16:31 Simple analytics for docs.kernel.org and patchwork, please? Jakub Kicinski @ 2024-02-23 17:14 ` Mauro Carvalho Chehab 2024-02-23 17:49 ` Jonathan Corbet ` (2 subsequent siblings) 3 siblings, 0 replies; 9+ messages in thread From: Mauro Carvalho Chehab @ 2024-02-23 17:14 UTC (permalink / raw) To: Jakub Kicinski; +Cc: workflows, Konstantin Ryabitsev, corbet, ast Em Fri, 23 Feb 2024 08:31:54 -0800 Jakub Kicinski <kuba@kernel.org> escreveu: > Hi! > > We have a few netdev-related bots with various simple status pages. > I hooked them up to analytics recently, here's the dash: > https://plausible.io/netdev.bots.linux.dev > > Plausible was described here: https://lwn.net/Articles/822568/ > it's supposedly-open, and privacy-focused, no cookies etc. > > It's useful for me when deciding where to invest my time, > and to back up the efforts to my employer with some data. > > Now, most of us agree that kernel docs leave something to be desired. > At the same time maintainers are repeatedly faced with people who post > code without reading the docs, which puts the time invested in writing > them into question. I can't help but think that providing some > analytics for docs.kernel.org traffic would be beneficial. > I would use it. > > Thoughts? > > Does anyone think that even non-intrusive analytics are a no go? > > Does anyone know better alternatives than Plausible? I have a small hand-made script using Pandas/Seaborn to produce some patchwork statistics as can be seen at: https://linuxtv.org/patchwork_stats.php Feel free to use it as a basis to get some stats. You may need to modify it to cover stats per project (on Linux media, there's just two projects, and the second one with handful number of patches per year). So, we didn't need to filter it per project. Also, please notice that the second query doesn't use any index on Patwork 3.1. I ended manually creating an index to speed it up on mariadb with: CREATE INDEX idx_patchwork_patch_stateid_date on patchwork_patch(state_id, date); I hope that helps. Regards, Mauro --- #!/usr/bin/env python3 # SPDX-License-Identifier: GPL-2.0 # Copyright(c) Mauro Carvalho Chehab <mchehab@kernel.org> from datetime import datetime, date, timedelta from matplotlib.dates import DateFormatter from matplotlib.pyplot import xlim from pandas import read_sql from seaborn import relplot, set_style,axes_style from sqlalchemy import create_engine, text DIR = './' def log(msg): now = datetime.now().strftime("%d/%m/%Y %H:%M:%S") print(f'{now}: {msg}') today = date.today() # Consider yesterday as the final date end_date = today - timedelta(days=1) # Two complete years + this month start_date = end_date.replace(year=today.year - 2).replace(month=today.month - 1) start_date = start_date.replace(day=1) interval = f'date >= "{start_date}" and date <= "{end_date}"' log("Connecting to database") engine = create_engine("mysql://patchwork:yaicCoqui@localhost/patchwork?charset=utf8mb4") palette = "bright" background = "#555555" style = { 'axes.facecolor':background, 'grid.color':'white', 'axes.edgecolor': 'orange', 'axes.labelcolor': 'orange', 'text.color': '#ffcc00', 'xtick.color': 'white', 'ytick.color': 'white', 'patch.edgecolor': 'orange', 'figure.facecolor':'black' } xformatter = DateFormatter("%Y-%m") with engine.connect() as conn: # Total patches query = text(f'select DATE_FORMAT(date, "%Y-%m") AS date, count(*) AS patches from patchwork_patch WHERE {interval} group by DATE_FORMAT(date, "%Y-%m") ORDER BY YEAR(date), MONTH(DATE)') log(query) total = read_sql(query, con=conn, parse_dates=['date']) log("Creating total patches graph") set_style(style="darkgrid",rc=style) print({k: v for k, v in axes_style().items() if "color" in k}) g = relplot(kind="line", marker='x', markers=True, data=total, x="date", y="patches") g.set_axis_labels("Date", "Number of patches", labelpad=10) g.set(title=f'Number of patches received per month between {start_date} and {end_date}') g.figure.set_size_inches(14, 6) print(g.ax) g.ax.margins(.05) g.ax.autoscale_view() g.ax.edgecolor="black" g.despine(trim=True, left=True, bottom=True) xlim(start_date, end_date) g.axes[0,0].xaxis.set_major_formatter(xformatter) g.savefig(DIR + 'patches_per_date.svg') # Patches per state query = text(f'select DATE_FORMAT(date, "%Y-%m") as date, st.name as State, count(*) as patches from patchwork.patchwork_patch AS p, patchwork_state as st where state_id = st.id and {interval} group by DATE_FORMAT(date, "%Y-%m"), st.id') log(query) per_state = read_sql(query, con=conn, parse_dates=['date']) log("Creating patches per state") per_state.set_index('date', inplace=True) g = relplot(kind="line", data=per_state, x="date", y="patches", hue="State", markers="State", marker="X", palette=palette) g.set_axis_labels("Date", "Number of patches", labelpad=10) g.set(title=f'Number of patches per state received per month between {start_date} and {end_date}') g.figure.set_size_inches(13.5, 6) g.ax.margins(.05) g.ax.autoscale_view() g.ax.edgecolor=False g.despine(trim=True, left=True, bottom=True) xlim(start_date, end_date) g.axes[0,0].xaxis.set_major_formatter(xformatter) g.add_legend(loc='upper left', bbox_to_anchor=(1.12, 0.5)) g.savefig(DIR + 'patches_per_state.svg') log("Done.") ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Simple analytics for docs.kernel.org and patchwork, please? 2024-02-23 16:31 Simple analytics for docs.kernel.org and patchwork, please? Jakub Kicinski 2024-02-23 17:14 ` Mauro Carvalho Chehab @ 2024-02-23 17:49 ` Jonathan Corbet 2024-02-23 20:02 ` Jakub Kicinski 2024-02-26 19:06 ` Jakub Kicinski 2024-02-26 19:24 ` Konstantin Ryabitsev 3 siblings, 1 reply; 9+ messages in thread From: Jonathan Corbet @ 2024-02-23 17:49 UTC (permalink / raw) To: Jakub Kicinski, workflows, Konstantin Ryabitsev; +Cc: ast Jakub Kicinski <kuba@kernel.org> writes: > Does anyone think that even non-intrusive analytics are a no go? What sorts of analytics are you looking for? Simple logfile analysis should be fairly uncontroversial and would tell you which documents are most of interest to the AI bots^W^Wdevelopers. Anything requiring, say, javascript in the browser is likely to get blocked by the kinds of people who might be interested in kernel docs. We did an overview of relatively innocuous analytics packages a few years ago: https://lwn.net/Articles/822568/ jon ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Simple analytics for docs.kernel.org and patchwork, please? 2024-02-23 17:49 ` Jonathan Corbet @ 2024-02-23 20:02 ` Jakub Kicinski 0 siblings, 0 replies; 9+ messages in thread From: Jakub Kicinski @ 2024-02-23 20:02 UTC (permalink / raw) To: Jonathan Corbet; +Cc: workflows, Konstantin Ryabitsev, ast On Fri, 23 Feb 2024 10:49:35 -0700 Jonathan Corbet wrote: > Jakub Kicinski <kuba@kernel.org> writes: > > > Does anyone think that even non-intrusive analytics are a no go? > > What sorts of analytics are you looking for? Simple logfile analysis > should be fairly uncontroversial and would tell you which documents are > most of interest to the AI bots^W^Wdevelopers. Yes, basic analysis of access.log would do. I think that's equivalent to what Plausible does. More of a question of what existing solution we can set up quickly, but have no preference on which method or tool we end up using. All we need is hit count for a subpage, with some basic dedup of a single reader hitting refresh... > Anything requiring, say, javascript in the browser is likely to get > blocked by the kinds of people who might be interested in kernel docs. Interesting. I spent 20min grepping the netdev's access.log. This may be confirmation bias, but vast majority of the hits are more or less thinly veiled bots. Unless we believe that someone from an Android phone decided to visit "admin.php" after landing on our page... (admin.php obviously doesn't exit) I zeroed in on the following metric - users who came from patchwork (clicked on CI results) over the last week. Plausible -> 17, IP addresses in access log with the right refer -> 18. The dates in logs may not match up exactly so the small delta is expected. After doing this exercise, I'd like to withdraw my previous statement that "access.log analysis" is fine. Now I think it's far more likely we'd miscount bots than that someone legit has blocked javascript... > We did an overview of relatively innocuous analytics packages a few > years ago: > > https://lwn.net/Articles/822568/ We need some analysis of how much of an email people actually read :) Look at the second paragraph of my first email, where do you think I found Plausible if not LWN ;) ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Simple analytics for docs.kernel.org and patchwork, please? 2024-02-23 16:31 Simple analytics for docs.kernel.org and patchwork, please? Jakub Kicinski 2024-02-23 17:14 ` Mauro Carvalho Chehab 2024-02-23 17:49 ` Jonathan Corbet @ 2024-02-26 19:06 ` Jakub Kicinski 2024-02-26 19:24 ` Konstantin Ryabitsev 3 siblings, 0 replies; 9+ messages in thread From: Jakub Kicinski @ 2024-02-26 19:06 UTC (permalink / raw) To: Konstantin Ryabitsev; +Cc: workflows, corbet, ast On Fri, 23 Feb 2024 08:31:54 -0800 Jakub Kicinski wrote: > Hi! > > We have a few netdev-related bots with various simple status pages. > I hooked them up to analytics recently, here's the dash: > https://plausible.io/netdev.bots.linux.dev > > Plausible was described here: https://lwn.net/Articles/822568/ > it's supposedly-open, and privacy-focused, no cookies etc. > > It's useful for me when deciding where to invest my time, > and to back up the efforts to my employer with some data. Hi Konstantin, are you open to trying some analytics? If yes I will go ask for approval to pay the bill. FWIW it's not a lot of work, for netdev pages I add the tracker with sed: sed -i 's@</title>$@</title><script defer data-domain="netdev.bots.linux.dev" src="https://plausible.io/js/script.js"></script>@' $file If there's a concern that other maintainers don't want this we can selectively sed just the networking pages? ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Simple analytics for docs.kernel.org and patchwork, please? 2024-02-23 16:31 Simple analytics for docs.kernel.org and patchwork, please? Jakub Kicinski ` (2 preceding siblings ...) 2024-02-26 19:06 ` Jakub Kicinski @ 2024-02-26 19:24 ` Konstantin Ryabitsev 2024-02-26 19:43 ` Jakub Kicinski 3 siblings, 1 reply; 9+ messages in thread From: Konstantin Ryabitsev @ 2024-02-26 19:24 UTC (permalink / raw) To: Jakub Kicinski; +Cc: workflows, corbet, ast On Fri, Feb 23, 2024 at 08:31:54AM -0800, Jakub Kicinski wrote: > Hi! > > We have a few netdev-related bots with various simple status pages. > I hooked them up to analytics recently, here's the dash: > https://plausible.io/netdev.bots.linux.dev > > Plausible was described here: https://lwn.net/Articles/822568/ > it's supposedly-open, and privacy-focused, no cookies etc. > > It's useful for me when deciding where to invest my time, > and to back up the efforts to my employer with some data. > > Now, most of us agree that kernel docs leave something to be desired. > At the same time maintainers are repeatedly faced with people who post > code without reading the docs, which puts the time invested in writing > them into question. I can't help but think that providing some > analytics for docs.kernel.org traffic would be beneficial. > I would use it. > > Thoughts? > > Does anyone think that even non-intrusive analytics are a no go? In general, my previous experience enabling libravatar on git.kernel.org has taught me that many very vocal people *really* don't like to have any kind of statistics gathered about them. However, if it's just for docs.kernel.org, then I don't think I have specific objections. That said, I would need help turning this on -- if someone can pass me along a Sphinx configuration option that I can enable during build time, then I'll be happy to add it to our build jobs. -K ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Simple analytics for docs.kernel.org and patchwork, please? 2024-02-26 19:24 ` Konstantin Ryabitsev @ 2024-02-26 19:43 ` Jakub Kicinski 2024-02-26 19:58 ` Jonathan Corbet 0 siblings, 1 reply; 9+ messages in thread From: Jakub Kicinski @ 2024-02-26 19:43 UTC (permalink / raw) To: Konstantin Ryabitsev; +Cc: workflows, corbet, ast, linux-doc On Mon, 26 Feb 2024 14:24:39 -0500 Konstantin Ryabitsev wrote: > In general, my previous experience enabling libravatar on git.kernel.org has > taught me that many very vocal people *really* don't like to have any kind of > statistics gathered about them. However, if it's just for docs.kernel.org, > then I don't think I have specific objections. > > That said, I would need help turning this on -- if someone can pass me along a > Sphinx configuration option that I can enable during build time, then I'll be > happy to add it to our build jobs. Excellent :) Let me CC linux-doc in case someone can tell us how to hook things in. Could you give me a ballpark number of page hits for docs.kernel.org? 500k page views a month should be enough? Plausible has different pricing depending on number of views, I need to know how much money to ask for. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Simple analytics for docs.kernel.org and patchwork, please? 2024-02-26 19:43 ` Jakub Kicinski @ 2024-02-26 19:58 ` Jonathan Corbet 2024-02-26 22:52 ` Jakub Kicinski 0 siblings, 1 reply; 9+ messages in thread From: Jonathan Corbet @ 2024-02-26 19:58 UTC (permalink / raw) To: Jakub Kicinski, Konstantin Ryabitsev; +Cc: workflows, ast, linux-doc Jakub Kicinski <kuba@kernel.org> writes: > On Mon, 26 Feb 2024 14:24:39 -0500 Konstantin Ryabitsev wrote: >> In general, my previous experience enabling libravatar on git.kernel.org has >> taught me that many very vocal people *really* don't like to have any kind of >> statistics gathered about them. However, if it's just for docs.kernel.org, >> then I don't think I have specific objections. >> >> That said, I would need help turning this on -- if someone can pass me along a >> Sphinx configuration option that I can enable during build time, then I'll be >> happy to add it to our build jobs. > > Excellent :) > > Let me CC linux-doc in case someone can tell us how to hook things in. It's probably not just a configuration option. I suspect that this will need to be done either by editing the templates or with a little extension. Either could require adding this support to the kernel repo, which might raise some eyebrows. jon ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Simple analytics for docs.kernel.org and patchwork, please? 2024-02-26 19:58 ` Jonathan Corbet @ 2024-02-26 22:52 ` Jakub Kicinski 0 siblings, 0 replies; 9+ messages in thread From: Jakub Kicinski @ 2024-02-26 22:52 UTC (permalink / raw) To: Jonathan Corbet; +Cc: Konstantin Ryabitsev, workflows, ast, linux-doc On Mon, 26 Feb 2024 12:58:43 -0700 Jonathan Corbet wrote: > Jakub Kicinski <kuba@kernel.org> writes: > > On Mon, 26 Feb 2024 14:24:39 -0500 Konstantin Ryabitsev wrote: > >> In general, my previous experience enabling libravatar on git.kernel.org has > >> taught me that many very vocal people *really* don't like to have any kind of > >> statistics gathered about them. However, if it's just for docs.kernel.org, > >> then I don't think I have specific objections. > >> > >> That said, I would need help turning this on -- if someone can pass me along a > >> Sphinx configuration option that I can enable during build time, then I'll be > >> happy to add it to our build jobs. > > > > Excellent :) > > > > Let me CC linux-doc in case someone can tell us how to hook things in. > > It's probably not just a configuration option. I suspect that this will > need to be done either by editing the templates or with a little > extension. Either could require adding this support to the kernel repo, > which might raise some eyebrows. FWIW I tried poking around to insert "script_files" into conf.py, because the RTD template does seem to have: {%- for scriptfile in script_files %} {{ js_tag(scriptfile) }} {%- endfor %} But I only managed to add a pure "include" with just the 'src' attribute on the <script> node, like: <script src="../../cabbage.js"></script> We also need to set 'defer' and "data-domain="docs.kernel.org"'. ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2024-02-26 22:52 UTC | newest] Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2024-02-23 16:31 Simple analytics for docs.kernel.org and patchwork, please? Jakub Kicinski 2024-02-23 17:14 ` Mauro Carvalho Chehab 2024-02-23 17:49 ` Jonathan Corbet 2024-02-23 20:02 ` Jakub Kicinski 2024-02-26 19:06 ` Jakub Kicinski 2024-02-26 19:24 ` Konstantin Ryabitsev 2024-02-26 19:43 ` Jakub Kicinski 2024-02-26 19:58 ` Jonathan Corbet 2024-02-26 22:52 ` Jakub Kicinski
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox