EDIT (2019): The below answer predates GDPR and likely requires revision.
Google Analytics has a new set of APIs to assist with compliance with a cookie opt-out. Here's the documentation, and here's their help docs.
There has been some ambiguity as to whether the EU Cookie Regulations (as implemented in member countries) require that passive web analytics tracking requires opt-in mechanisms for compliance. If you're concerned one way or another, consult an attorney. Google is empowering you to make the decision as to how you want to proceed.
They'll leave implementation details to you, but, the idea is, once you've determined whether or not to track the user in Google Analytics, if the answer is to not track, you'd set the following property to true before Google Analytics runs:
window['ga-disable-UA-XXXXXX-Y'] = true;
Where UA-XXXXXX-Y is your account ID in Google Analytics
As the other posters have noted, Google Analytics relies on cookies. So, you're not able to do any kind of tracking without cookies. If you've determined that someone is not to be cookied for tracking, you'll need to implement something like this:
if(doNotCookie()){
window['ga-disable-UA-XXXXXX-Y'] = true;
}
Opt In
This does require a little bit of jujitsu for when you first load Google Analytics, since this property will need to be set before Google Analytics runs to prevent tracking from ever happening, which means, for an "opt in to tracking" approach, you'd probably need to implement a mechanism where, on first visit, Google Analytics is automatically disabled in the absence of an opt-in cookie (cookies that determine cookie preferences are explicitly allowed), and then, if an opt-in happens, re-runs Google Analytics. On subsequent pageviews, all would run smoothly.
Could look something like (pseudo-code):
if( hasOptedOut() || hasNotExpressedCookiePreferenceYet() ){ //functions you've defined elsewhere
window['ga-disable-UA-XXXXXX-Y'] = true;
}
var _gaq = _gaq || [];
_gaq.push(['_setAccount', 'UA-XXXXXXX-Y']);
_gaq.push(['_trackPageview']);
function onOptIn(){ //have this run when/if they opt-in.
window['ga-disable-UA-XXXXXX-Y'] = false;
//...snip...
//set a cookie to express that the user has opted-in to tracking, for future pageviews
_gaq.push(['_trackPageview']); // now run the pageview that you 'missed'
}
Opt Out
With this approach, you'd allow the user to opt-out of tracking, which would mean you'd use a cookie to set the ga-disable-UA-XXXXXX-Y'
property and a cookie to manage it in the future:
if( hasOptedOut() ){ // function you've defined elsewhere
window['ga-disable-UA-XXXXXX-Y'] = true;
}
var _gaq = _gaq || [];
_gaq.push(['_setAccount', 'UA-XXXXXX-Y']);
_gaq.push(['_trackPageview']);
This is a really great idea. I'm a little late to this, but you should be able to accomplish this by downloading all of the data using the Google Analytics Reporting API, store it in a local database/file/whatever, and then build your recommendation engine by aggregating the statistics by hand and storing them locally.
To get the data from the Reporting API, try playing with the query explorer and extracting the number of visits to pages between all pairs of paths using a method similar to @carlsoja:
dimensions=ga:previousPagePath,ga:pagePath&metrics=ga:visits
In order to get all of the data, you will have to use one of the Core Reporting Client Libraries to paginate through the results (which you can experiment with in the query explorer).
Once you have all of the data, you can pretty easily calculate the Markov Chain transition probabilities that a person visits page /A
after they have visited page /B
, or p(/A | /B)
. Then it would be pretty straightforward to estimate the probability that someone visits page /A
if they visited page /B
at some point in the past. If you wanted to get really fancy, you could use their complete history {H}
to make recommendations for pages by estimating p(/A | {H})
, but I'll leave that as an exercise for the reader ;)
Hope this helps!
Best Answer
Yes - you will need to use the Google Analytics API for this. I would suggest checking out the Query Explorer to get a feel for the queries you will need to create.
You will require numerous queries to get all the data you need (adjusting the starting date): - Total Views - This Year - This Month - This Week (i.e. last 7 days - from which you could also get Total Today).
Here is an example query:
Alternatively, you might want to consider www.embeddedanalytics.com (disclosure - I work with them). We have a service/platform that allows website owners to embed GA based charts/statistics without having to learn the GA API. We have a CMS version which will do exactly what you need (where you script the call to pass the page path). We have done something like this with a number of podcast sharing sites.