Posts tagged with google analytics

Retrieving Google Analytics data with Python...

Published at Sept. 6, 2011 | Tagged with: , , , , , , ,

... or how to pull data about page visits instead of implementing custom counter

Preface: OK, so you have a website, right? And you are using Google Analytics to track your page views, visitors and so on?(If not you should reconsider to start using it. It is awesome, free and have lost of features as custom segments, map overlay, AdSense integration and many more.)
So you know how many people have visited your each page of your website, the bounce rate, the average time they spend on the page etc. And this data is only for you or for a certain amount whom you have granted access.

Google Analytics

Problem: But what happens if one day you decided to show a public statistic about visitors on your website. For example: How many people have opened the "Product X" page?
Of course you can add a custom counter that increases the views each time when the page is open. Developed, tested and deployed in no time. Everyone is happy until one day someones cat took a nap on his keyboard and "accidentally" kept the F5 button pressed for an hour. The result is simple - one of you pages has 100 times more visits than the other. OK, you can fix this with adding cookies, IP tracking etc. But all this is reinventing the wheel. You already have all this data in your Google Analytics, the only thing you have to do is to stretch hand and take it.

Solution: In our case "the hand" will be an HTTP request via the Google Data API. First you will need to install the Python version of the API:

sudo easy_install gdata
Once you have the API installed you have to build a client and authenticate:
SOURCE_APP_NAME = 'The-name-of-you-app'
my_client = gdata.analytics.client.AnalyticsClient(source=SOURCE_APP_NAME)
my_client.client_login(
    'USERNAME',
    'PASSWORD',
    source=SOURCE_APP_NAME,
    service=my_client.auth_service,
    account_type = 'GOOGLE',
)

token = my_client.auth_token
SOURCE_APP_NAME is the name of the application that makes the request. You can set it to anything you like. After you build the client(2) you must authenticate using your Google account(3-9). If you have both Google and Google APPs account with the same username be sure to provide the correct account type(8). Now you have authenticated and it is time to build the request. Obviously you want to filter the data according some rules. The easiest way is to use the Data Feed Query Explorer to build your filter and test it and then to port it to the code. Here is an example how to get the data about the page views for specific URL for a single month(remember to update the PROFILE_ID according to your profile).
account_query = gdata.analytics.client.AccountFeedQuery()
data_query = gdata.analytics.client.DataFeedQuery({
    'ids': 'ga:PROFILE_ID',
    'dimensions': '', #ga:source,ga:medium
    'metrics': 'ga:pageviews',
    'filters': 'ga:pagePath==/my_url_comes_here/',
    'start-date': '2011-08-06',
    'end-date': '2011-09-06',
    'prettyprint': 'true',
    })

feed = my_client.GetDataFeed(data_query)
result = [(x.name, x.value) for x in feed.entry[0].metric]

Final words: As you see it is relatively easy to get the data from Google but remember that this code makes two request to Google each time it is executed. So you will need to cache the result. The GA data is not real-time so you may automate the process to pull the data(if I remember correctly the data is updated once an hour) and store the results at your side which will really improve the speed. Also have in mind that this is just an example how to use the API instead of pulling the data page by page(as show above) you may pull the results for multiple URLs at once and compute the feed to get your data. It is all in your hands.
You have something to add? Cool I am always open to hear(read) you comments and ideas.

Update: If you are using Django you should consider to use it Memcached to cache these result as shown in Caching websites with Django and Memcached

Google Analytics referring sites and https

Published at Sept. 16, 2010 | Tagged with: , , ,

The problem: recently a client reported the he does not see another site he owns in the list of referring sites in his Google Analytics account(for his main website) but was absolutely sure that he recieves trafic from this website.

The research and the reason: at first I thought that the problem is cause by the 301 redirects the site use to send you to the propper language version of the website(cookies based). But after some reasearch I found that these 301 redirects keep the original referrer so the problem was somewhere else. So I open a page from the second site, click a link to the main one and run "document.referrer" in the console, strange but the answer was empty. I tried it clicking link from other site and everything was ok, the result was the referring site so the problem really was with this one. After some more wondering what is really going on I finally noticed that the problem website was working under secured(https) connection.
According to RFC2616 section 15.1.3

Clients SHOULD NOT include a Referer header field in a (non-secure) HTTP request if the referring page was transferred with a secure protocol.

OK, so the reason for the problem was found but how to solve it?

The solution: Google Analytics allows you to override the referral URL so the only thing you need is to add get parameter to the link url with the url of the pointing page, for example: http://main.com/news/?referral_site=second-website.com
Then place some logic that will check for "referral_site" parameter and use the_setReferrerOverride() gladly provided by the GA team to override the referral address.

If you have another solutions or questions please let me know.