Browsing my browsing

I mentioned last week that I’d installed MeeTimer and was using it to track my browsing history. Now I’ve built up a weeks worth of data, it’s time to do something fun with it.

If I want to eliminate, or at least manage, distractions, it will be useful to know not just which sites I spend the most accumulated time visiting (MeeTimer already does a pretty good of showing me this), but also which sites I visit most frequently. Because MeeTimer stores all of its lovely date in an SQLite database it’s easy to get to it and create pretty graphs like this one…

Top 30 visited sites for week of 30th March

Even better, lots of scripting languages have support for SQLite (I’m using Xampp as a convenient stack containing Apache PHP 5 and SQLite 3). After hacking around for a couple of hours, my nasty little PHP script was serving up this sort of thing:

Experiment: Browser DNA

(Larger version)

MeeTimer lets you group URLs into different groups, so here those groups are displayed using different coloured rows. Yellow represents site’s I’ve grouped as ‘work’ (mainly work’s webmail address), so it’s easy to see that when I was working on my laptop at home on Friday, i.e. 2 days ago, I was accessing work webmail pretty constantly. I have a desktop at work, so on most days I don’t need to use webmail to check my email except for on the train on the way in and out, but for some reason I had it open for ages on Monday morning (i.e 6 days ago). Perhaps I was away from desk?

I’m also experimenting with alternative ways of displaying the history, including showing the favicons for certain sites.

Experiment: a brief history of favicons

(Larger version)

Here I’m just showing the visits to about a dozen sites I seem to visit (very) regularly, e.g. Twitter, Gmail, Flickr, Google, Technorati, Feedburner, Google Reader, Delicious, etc. You can see that I habitually check Gmail about once per hour, and visit Twitter even more regularly than that.

The code for the DNA one is a bit specific to my groups, so I want to generalise that to work for all groups before I share it, but I’ve put the code on GitHub for the favicon one. It’ll probably only work in Firefox 2 or better. Canvas should work in Safari, but I’ve probably used Mozilla specific stuff for the text. This was a very quick hack, and there’s plenty of scope for enhancements, so let me know if you make any improvements.

MeeTimer and myware and SQLite

I’m interested in the idea of self-interested self-surveillance. Long before we had PMOG (the Passively Multiplayer Online Game, now called The Nethernet) to make a game of it, Seth Goldstein was calling the idea ‘myware’ and building the (short-lived) AttentionTrust site. As Fred Wilson said at the time, “If someone is going to spy on you, it’s probably best if its you.”

With that in mind, I installed MeeTimer over the weekend. It’s a Firefox plugin which…

records where you spend your time online. It does it in a rather useful way, by allowing you to group websites into activities … so you can make sense of where your time is going. Finally, it accumulates time spent on a site over the course of a day…

I’ve been using it for 3 days and it’s giving some interesting food for thought.

MeeTimer

You can even optionally set up ‘tab warnings’ on specific groups (sites you’ve labeled ‘Procrastination’, say) which will pop up with a nice overlay telling you exactly how much time you’ve wasted in this site, and others in the same category (though allows you to click through and ignore the warning just this once or for the current browsing session if you still want to). I’m already finding this feature useful on the handful sites whose feed I’m subscribed to but for some reason still find myself visiting out of habit. (For me, it’s Waxy links and Boing Boing. I love them, but I’d rather be reminded to enjoy them as part of my feed reading routine rather than browsing out of habit. I bet you have your own which make you ask is this really what you want to be doing right now?). A little reminder is really useful for habit-breaking here.

Mostly MeeTimer is just quietly keeping track of a bunch of per-site accumulators, cleverly based on whether Firefox has focus and which is the currently active tab. The results are already interesting. I realised that I was spending a bit less time on Twitter and Flickr, and a bit more time on work webmail, than I thought.

This is all very well, but I want more. Specifically, I wanted to get at the data. Not just the accumulated weekly/daily/monthly (etc) totals and averages, but the number of visits to each site per day. The raw visits. In as much detail as possible. I want CSV exports, or an API, or something. If I’m spending a daily average of 21 minutes on Twitter, how many visits comprise that time? MeeTimer simply doesn’t tell me.

Or does it?

Digging around my Firefox profile, I find a very interesting file at /Library/Application Support/Firefox/Profiles/{profile-id}/meetimer.sqlite. Ooh, I bet I know what that is. So I open up SQLite and start poking.

Sorry. It’s about to get a bit dull from here on in. Unless you get excited about the idea of being able to manipulate this data you’ll probably want to scroll down to the end. Honestly, I won’t mind.

They’ve gone? Right. Let’s get hacking.

$ sqlite
SQLite version 3.6.12
Enter ".help" for instructions
Enter SQL statements terminated with a ";"
sqlite> .restore meetimer.sqlite
sqlite> .tables
deterrent_stats  groups           log            
deterrentlinks   groups_urls      url_maps       
deterrents       ignored_urls     urls

Excellent. We’ve got tables with sensible names and everything. Let’s see what log looks like.

sqlite> .headers on
sqlite> select * from log limit 3;
url_id|startdate|duration|day|week
4|1238324612508|3|200987|200913
5|1238324617244|44|200987|200913
6|1238324647668|17|200987|200913
sqlite> select * from urls limit 3;
id|url
1|mail.google.com
2|www.google.com
3|www.google.co.uk

Lovely. Easy enough then. The groups and groups_urls tables do what you’d expect too. For now, let’s make url_id more meaningful by doing a join with the url table.

sqlite> select
  url_id, duration, day, week, url
from log
left join urls on log.url_id=urls.id
limit 5;
url_id|duration|day|week|url
4|3|200987|200913|google.co.uk
8|40|200987|200913|meetimer.productivefirefox.com
4|16|200987|200913|google.co.uk
11|10|200987|200913|technorati.com
12|14|200987|200913|google.com/reader/

What if we wanted to show the number of visits, the total duration, and the maximum length of duration for visits to Twitter…

sqlite> select
  count(url_id), sum(duration), max(duration), url
from log
left join urls on log.url_id=urls.id
where url = 'twitter.com';
count(url_id)|sum(duration)|max(duration)|url
34|2712|455|twitter.com

Excellent. I wonder what the top seven URLs when ordered by the number of visits?

sqlite> select
  url_id, count(url_id), sum(duration), max(duration), day, week, url
from log
left join urls on log.url_id=urls.id
group by url
order by count(url_id) desc
limit 7;
url_id|count(url_id)|sum(duration)|max(duration)|day|week|url
9|34|2712|455|200989|200914|twitter.com
10|30|1075|249|200989|200914|search.twitter.com
1|22|2505|928|200989|200914|mail.google.com
4|20|206|57|200989|200914|google.co.uk
17|18|476|114|200989|200914|flickr.com
21|10|2480|2125|200989|200914|bbc.co.uk
39|8|13152|10212|200989|200914|webmail.bbc.co.uk

Twitter, with 34 visits. Sheesh. And for comparison, the top 7 sites by total duration of visit?

sqlite> select
  url_id, count(url_id), sum(duration), max(duration), day, week, url
from log
left join urls on log.url_id=urls.id
group by url
order by sum(duration) desc
limit 5;
url_id|count(url_id)|sum(duration)|max(duration)|day|week|url
39|8|13152|10212|200989|200914|webmail.bbc.co.uk
9|34|2712|455|200989|200914|twitter.com
1|22|2505|928|200989|200914|mail.google.com
21|10|2480|2125|200989|200914|bbc.co.uk
12|6|1355|633|200989|200914|google.com/reader/

13152 seconds (3.6 hours) on my work webmail between Sunday morning and Wednesday aftenoon. And all done in 8 visits. Yuck.

Ok. Let’s start thinking about daily summaries. Grouping by day, and then by URL (since I’m not very good at SQL, and don’t know how to limit it to 5 per day, I’ll just manually snip out all but the top 5 for each day for now)…

sqlite> select
  url_id, count(url_id), sum(duration), max(duration), day, url from log
left join urls on log.url_id=urls.id
group by day, url
order by day, sum(duration) desc;
url_id|count(url_id)|sum(duration)|max(duration)|day|url
1|2|306|228|200987|mail.google.com
9|6|296|217|200987|twitter.com
12|2|225|211|200987|google.com/reader/
28|1|128|128|200987|hunch.com
21|1|66|66|200987|bbc.co.uk
[...]
39|3|10222|10212|200988|webmail.bbc.co.uk
21|3|2155|2125|200988|bbc.co.uk
9|18|1494|235|200988|twitter.com
1|12|1003|185|200988|mail.google.com
10|14|777|249|200988|search.twitter.com
[...]
39|5|2930|2667|200989|webmail.bbc.co.uk
1|8|1196|928|200989|mail.google.com
9|10|922|455|200989|twitter.com
12|1|394|394|200989|google.com/reader/
21|6|259|151|200989|bbc.co.uk
[...]

And returning to the original question of just how many visits do I make to Twitter

sqlite> select
  count(url_id) as visits,
  round(sum(duration) / 60.0, 2) as total,
  round(max(duration) / 60.0, 2) as longest
from log
left join urls on log.url_id=urls.id
where url = 'twitter.com'
group by day
order by day;
visits|total|longest
6|4.93|3.62
18|24.9|3.92
10|15.37|7.58

So it seems that on Sunday I made 6 visits for a total of about 5 minutes and a single longest session of 3 and a half minutes. On Monday it was 18 visits for a total of 25 minutes including one session of nearly 4 minutes, while today, 10 visits so far (including one of over 7 minutes) have already added up to over 15 minutes.

.mode csv

in SQLite is handy too, because it changes that list format to look like

visits,total,longest
6,4.93,3.62
18,24.9,3.92
10,15.37,7.58

so it’s trivial to open it in a spreadsheet.

Making graphs from MeeTimer

Even better will be something cunning and programmatic. Maybe in PHP or Ruby or something. Even this exploratory manual approach is fun though. It will obviously be better once I’ve built up a bit more history but now I know that MeeTimer is storing my data in a way that I can access it, I’m even more excited about it. Thanks, MeeTimer. You rock.

Powered by WordPress with GimpStyle Theme design by Horacio Bella.
The postings on this site are my own and don't necessarily represent my employer's positions, strategies or opinions.