Re: Simple analytics for docs.kernel.org and patchwork, please?

workflows.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Jakub Kicinski <kuba@kernel.org>
To: Jonathan Corbet <corbet@lwn.net>
Cc: workflows@vger.kernel.org,
	Konstantin Ryabitsev <konstantin@linuxfoundation.org>,
	ast@kernel.org
Subject: Re: Simple analytics for docs.kernel.org and patchwork, please?
Date: Fri, 23 Feb 2024 12:02:00 -0800	[thread overview]
Message-ID: <20240223120200.2e04dd3d@kernel.org> (raw)
In-Reply-To: <87sf1j6pg0.fsf@meer.lwn.net>

On Fri, 23 Feb 2024 10:49:35 -0700 Jonathan Corbet wrote:
> Jakub Kicinski <kuba@kernel.org> writes:
> 
> > Does anyone think that even non-intrusive analytics are a no go?  
> 
> What sorts of analytics are you looking for?  Simple logfile analysis
> should be fairly uncontroversial and would tell you which documents are
> most of interest to the AI bots^W^Wdevelopers. 

Yes, basic analysis of access.log would do. I think that's equivalent
to what Plausible does. More of a question of what existing solution we
can set up quickly, but have no preference on which method or tool we
end up using.

All we need is hit count for a subpage, with some basic dedup of
a single reader hitting refresh...

> Anything requiring, say, javascript in the browser is likely to get
> blocked by the kinds of people who might be interested in kernel docs.

Interesting. I spent 20min grepping the netdev's access.log.
This may be confirmation bias, but vast majority of the hits
are more or less thinly veiled bots. Unless we believe that
someone from an Android phone decided to visit "admin.php"
after landing on our page... (admin.php obviously doesn't exit)

I zeroed in on the following metric - users who came from patchwork
(clicked on CI results) over the last week. Plausible -> 17,
IP addresses in access log with the right refer -> 18.
The dates in logs may not match up exactly so the small delta is
expected.

After doing this exercise, I'd like to withdraw my previous statement
that "access.log analysis" is fine. Now I think it's far more likely
we'd miscount bots than that someone legit has blocked javascript...

> We did an overview of relatively innocuous analytics packages a few
> years ago:
> 
>   https://lwn.net/Articles/822568/

We need some analysis of how much of an email people actually read :)
Look at the second paragraph of my first email, where do you think 
I found Plausible if not LWN ;)

next prev parent reply	other threads:[~2024-02-23 20:02 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-02-23 16:31 Jakub Kicinski
2024-02-23 17:14 ` Mauro Carvalho Chehab
2024-02-23 17:49 ` Jonathan Corbet
2024-02-23 20:02   ` Jakub Kicinski [this message]
2024-02-26 19:06 ` Jakub Kicinski
2024-02-26 19:24 ` Konstantin Ryabitsev
2024-02-26 19:43   ` Jakub Kicinski
2024-02-26 19:58     ` Jonathan Corbet
2024-02-26 22:52       ` Jakub Kicinski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240223120200.2e04dd3d@kernel.org \
    --to=kuba@kernel.org \
    --cc=ast@kernel.org \
    --cc=corbet@lwn.net \
    --cc=konstantin@linuxfoundation.org \
    --cc=workflows@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox