Transparency Timing

Web Analytics is a growth sector.  Not least because much of the web traffic data is relatively available, as it is generally automatically produced behind the scenes for any web site. 

I’ve used the web stats from the official web site of the Prime Minister’s Office… Number10.gov.uk, as illustration of first level of web stats (See: Presentation to the Royal Statistical Society’s International Conference 2010: Web analytics – a new statistical domain).  This has been a good example of the industry standard web stats (ABCe), and has been a great exemplar for speed of publication and transparency.   

Not only are these industry standard data, but No.10 has published these within a day or two from the end of the month.  There’s always a need to balance the speed of publication with the need to ensure its quality and consistency.  But because this is simple data (“it is what it is”), collected automatically, and to industry standard definitions, then this timing is quite achievable and reasonable.  There in is at least one of the secrets to prompt publication, sort as much as possible at stages higher up the data food chain. 

So for No.10, this time from the end of the monthly data collection period (midnight on the last day) to the publication, is sometimes measurable in hours rather than days.  In comparison (and from experience), that quality assurance work for the mainstream official and national statistics data is often only available weeks, typically months, sometimes even years after the end of the data collection period.

While these No.10 web stats are arguably not official statistics (See Statistics and Registration Service Act 2007) the UK Statistics Authority best practice guide (Code of Practice: Protocol 2, Practice 1) encourages release at the earliest opportunity…. 'release statistical reports as soon as that are judged ready, so that there is no opportunity, or perception of opportunity, for the release to be withheld of delayed." 

But now the flow of data has stopped.  The last published data was for November 2010.   It's just that the previous data shows steadily reducing page views per visit each month….

The three pieces of monthly web data that are provided are (a) Page views, (b) Visits (c) Unique visits.  And here's the data so far....

All these three measures showed month on month reductions through June, July, August and September, then rising a little for October and November.  Technically we would expect some variation here simply due to the months not all having the same number of days, and typically quieter summer months.

A simple and common enough web analytics derivative calculation from these measures is page views per visit (which at the same time standardises for length of month).  In the simplest of terms what I’ll call an “interesting Index”.   There are some implicit assumptions going on here.  Most importantly (given that this is a ratio) is that the number of pages on the site has remained stable.  After all if there’s less content then there’s bound to be less staying power.

Sure enough when web traffic was highest, in May 2010 just after the election, an average of 4.2 pages were viewed per visit.  This has been steadily declining month on month to 2.3 pages per visit for the latest data.  Given that May was a peak, it’s worth looking at June, which was an average 3.0 pages per visit.  So from June, in simple terms the pages viewed per visit is reducing by 0.7 over 6 months. So at that rate - reducing by 0.7 every 6 months -it’s just over three years until there are no page views per visit at all……(which technically would imply zero visits).

Of course with what is a relatively small amount of data, this is all more indicative and insight.  The trend is almost certainly non–linear, and tailing to a stable level.  Really need more data to tell, and there of course is the problem.   Sure, this is not the most important data out there, and may indeed stumble at the “so what”, or “who cares” tests.  With the whole open data movement this is not about pre-judging potential uses or value, rather more about assuming “helpful until proven to be unhelpful”.  That starts to become a relative proxy for value of money in relation to data collection costs.

So with an open data movement, a big transparency agenda, and the need to build trust in official public statistics, this surprisingly missing data which is already pointing to declining in web site usage, looks like an awkward oversight, and especially so from the seat of government and champion of transparency.