Retrieving Google Analytics data with Python...

Published at Sept. 6, 2011 | Tagged with: , , , , , , ,

... or how to pull data about page visits instead of implementing custom counter

Preface: OK, so you have a website, right? And you are using Google Analytics to track your page views, visitors and so on?(If not you should reconsider to start using it. It is awesome, free and have lost of features as custom segments, map overlay, AdSense integration and many more.)
So you know how many people have visited your each page of your website, the bounce rate, the average time they spend on the page etc. And this data is only for you or for a certain amount whom you have granted access.

Google Analytics

Problem: But what happens if one day you decided to show a public statistic about visitors on your website. For example: How many people have opened the "Product X" page?
Of course you can add a custom counter that increases the views each time when the page is open. Developed, tested and deployed in no time. Everyone is happy until one day someones cat took a nap on his keyboard and "accidentally" kept the F5 button pressed for an hour. The result is simple - one of you pages has 100 times more visits than the other. OK, you can fix this with adding cookies, IP tracking etc. But all this is reinventing the wheel. You already have all this data in your Google Analytics, the only thing you have to do is to stretch hand and take it.

Solution: In our case "the hand" will be an HTTP request via the Google Data API. First you will need to install the Python version of the API:

sudo easy_install gdata
Once you have the API installed you have to build a client and authenticate:
SOURCE_APP_NAME = 'The-name-of-you-app'
my_client = gdata.analytics.client.AnalyticsClient(source=SOURCE_APP_NAME)
my_client.client_login(
    'USERNAME',
    'PASSWORD',
    source=SOURCE_APP_NAME,
    service=my_client.auth_service,
    account_type = 'GOOGLE',
)

token = my_client.auth_token
SOURCE_APP_NAME is the name of the application that makes the request. You can set it to anything you like. After you build the client(2) you must authenticate using your Google account(3-9). If you have both Google and Google APPs account with the same username be sure to provide the correct account type(8). Now you have authenticated and it is time to build the request. Obviously you want to filter the data according some rules. The easiest way is to use the Data Feed Query Explorer to build your filter and test it and then to port it to the code. Here is an example how to get the data about the page views for specific URL for a single month(remember to update the PROFILE_ID according to your profile).
account_query = gdata.analytics.client.AccountFeedQuery()
data_query = gdata.analytics.client.DataFeedQuery({
    'ids': 'ga:PROFILE_ID',
    'dimensions': '', #ga:source,ga:medium
    'metrics': 'ga:pageviews',
    'filters': 'ga:pagePath==/my_url_comes_here/',
    'start-date': '2011-08-06',
    'end-date': '2011-09-06',
    'prettyprint': 'true',
    })

feed = my_client.GetDataFeed(data_query)
result = [(x.name, x.value) for x in feed.entry[0].metric]

Final words: As you see it is relatively easy to get the data from Google but remember that this code makes two request to Google each time it is executed. So you will need to cache the result. The GA data is not real-time so you may automate the process to pull the data(if I remember correctly the data is updated once an hour) and store the results at your side which will really improve the speed. Also have in mind that this is just an example how to use the API instead of pulling the data page by page(as show above) you may pull the results for multiple URLs at once and compute the feed to get your data. It is all in your hands.
You have something to add? Cool I am always open to hear(read) you comments and ideas.

Update: If you are using Django you should consider to use it Memcached to cache these result as shown in Caching websites with Django and Memcached