Retrieving Google Analytics data with Python...
... or how to pull data about page visits instead of implementing custom counter
Preface: OK, so you have a website, right? And you are using Google Analytics to track your page views, visitors and so on?(If not you should reconsider to start using it. It is awesome, free and have lost of features as custom segments, map overlay, AdSense integration and many more.)
So you know how many people have visited your each page of your website, the bounce rate, the average time they spend on the page etc. And this data is only for you or for a certain amount whom you have granted access.
Problem: But what happens if one day you decided to show a public statistic about visitors on your website. For example: How many people have opened the "Product X" page?
Of course you can add a custom counter that increases the views each time when the page is open. Developed, tested and deployed in no time. Everyone is happy until one day someones cat took a nap on his keyboard and "accidentally" kept the F5 button pressed for an hour. The result is simple - one of you pages has 100 times more visits than the other. OK, you can fix this with adding cookies, IP tracking etc. But all this is reinventing the wheel. You already have all this data in your Google Analytics, the only thing you have to do is to stretch hand and take it.
Solution: In our case "the hand" will be an HTTP request via the Google Data API. First you will need to install the Python version of the API:
sudo easy_install gdata
SOURCE_APP_NAME = 'The-name-of-you-app'
my_client = gdata.analytics.client.AnalyticsClient(source=SOURCE_APP_NAME)
my_client.client_login(
'USERNAME',
'PASSWORD',
source=SOURCE_APP_NAME,
service=my_client.auth_service,
account_type = 'GOOGLE',
)
token = my_client.auth_token
account_query = gdata.analytics.client.AccountFeedQuery()
data_query = gdata.analytics.client.DataFeedQuery({
'ids': 'ga:PROFILE_ID',
'dimensions': '', #ga:source,ga:medium
'metrics': 'ga:pageviews',
'filters': 'ga:pagePath==/my_url_comes_here/',
'start-date': '2011-08-06',
'end-date': '2011-09-06',
'prettyprint': 'true',
})
feed = my_client.GetDataFeed(data_query)
result = [(x.name, x.value) for x in feed.entry[0].metric]
Final words: As you see it is relatively easy to get the data from Google but remember that this code makes two request to Google each time it is executed. So you will need to cache the result. The GA data is not real-time so you may automate the process to pull the data(if I remember correctly the data is updated once an hour) and store the results at your side which will really improve the speed. Also have in mind that this is just an example how to use the API instead of pulling the data page by page(as show above) you may pull the results for multiple URLs at once and compute the feed to get your data. It is all in your hands.
You have something to add? Cool I am always open to hear(read) you comments and ideas.
Update: If you are using Django you should consider to use it Memcached to cache these result as shown in Caching websites with Django and Memcached