Posts tagged with web development

Django for Web Prototyping

Published at April 15, 2013 | Tagged with: , , , , , ,

Or how to use the benefits of Django template system during the PSD to HTML phase

There are two main approaches to start designing a new project - Photoshop mock-up or an HTML prototype. The first one is more traditional and well established in the web industry. The second one is more alternative and (maybe)modern. I remember a video of Jason Fried from 37 Signals where he talks about design and creativity. You can see it at http://davegray.nextslide.com/jason-fried-on-design. There he explains how he stays away from the Photoshop in the initial phase to concetrate on the things that you can interact with instead of focusing on design details.

I am not planning to argue which is the better method, the important thing here is that sooner or later you get to the point where you have to start the HTML coding. Unfortunately frequently this happens in a pure HTML/CSS environment outside of the Django project and then we waste some extra amount of time to convert it to Django templates.

Wouldn't be awesome if you can give the front-end developers something that they can install/run with a simple command and still to allow them to work in the Django environment using all the benefits it provides - templates nesting and including, sekizai tags etc.

I have been planning to do this for a long time and finally it is ready and is available at Django for Prototyping. Currently the default template includes Modernizr, jQuery and jQuery UI but you can easily modify it according to your needs. I would be glad of any feedback and ideas of improvement so feel free to try it and comment.

Simple Site Checker and the User-Agent header

Published at Oct. 22, 2012 | Tagged with: , , , ,

Preface: Nine months ago(I can't believe it was that long) I created a script called Simple Site Checker to ease the check of sitemaps for broken links. The script code if publicly available at Github. Yesterday(now when I finally found time to finish this post it must be "A few weeks ago") I decided to run it again on this website and nothing happened - no errors, no warning, nothing. Setting the output level to DEBUG showed the following message "Loading sitemap ..." and exited.
Here the fault was mine, I have missed a corner case in the error catching mechanism i.e. when the sitemap URL returns something different from "200 OK" or "500 internal server error". Just a few second and the mistake was fix.

Problem and Solution: I ran the script again and what a surprise the sitemap URL was returning "403 Forbidden". At the same time the sitemap was perfectly accessible via my browser. After some thinking I remembered about that some security plugins block the access to the website if there is not User-Agent header supplied. The reason for this is to block the access of simple script. In my case even an empty User-Agent did the trick to delude the plugin.

urllib2.urlopen(urllib2.Request(url,
                                headers={'User-Agent': USER_AGENT}))

Final words: As a result of the issue mention above one bug in simple site checker was found fixed. At the same time another issue about missing status and progress was raised, more details can be found at Github but in a few words an info message was added to each processed URL to indicate the progress.

If you have any ideas for improvement or anything else feel free to comment, create issues and/or fork the script.

Automated deployment with Ubuntu, Fabric and Django

Published at Sept. 18, 2012 | Tagged with: , , , , , ,

A few months ago I started to play with Fabric and the result was a simple script that automates the creation of a new Django project. In the last months I continued my experiments and extended the script to a full stack for creation and deployment of Django projects.

As the details behind the script like the project structure that I use and the server setup are a bit long I will keep this post only on the script usage and I will write a follow up one describing the project structure and server.

So in a brief the setup that I use consist of Ubuntu as OS, Nginx as web server and uWSGI as application server. The last one is controlled by Upstart. The script is available for download at GitHub.
In a wait for more detailed documentation here is a short description of the main tasks and what they do:

startproject:<project_name>

  • Creates a new virtual environment
  • Installs the predefined packages(uWSGI, Django and South))
  • Creates a new Django project from the predefined template
  • Creates different configuration files for development and production environment
  • Initializes new git repository
  • Prompts the user to choose a database type. Then installs database required packages, creates a new database and user with the name of the project and generates the local settings file
  • Runs syncdb(if database is selected) and collectstatic

setup_server

  • installs the required packages such as Nginx, PIP, GCC etc
  • prompts for a database type and install it
  • reboots the server as some of the packages may require it

Once you have a ready server and a project just use this the following task to deploy it to the server. Please have in mind that it should be used only for initial deployment.

deploy_project:<project_name>,<env_type>,<project_repository>

  • Creates a new virtual environment
  • Clones the project from the repository
  • Installs the required packages
  • Creates symbolic links to the nginx/uwsgi configuration files
  • Asks for database engine, creates new DB/User with the project name and updates the settings file
  • Calls the update_project task

update_project:<project_name>,<env_type>

  • Updates the source code
  • Installs the required packages(if there are new)
  • Runs syncdb, migrate and collect static
  • Restart the Upstart job that runs the uWSGI and the Nginx server

The script is still in development so use it on your own risk. Also it reflects my own idea of server/application(I am planning to describe it deeper in a follow up post) setup but I would really like if you try it and give a feedback. Feel free to fork, comment and advice.

The road to hell is paved with regular expressions ...

Published at July 31, 2012 | Tagged with: , ,

... or what is the cost of using regular expressions for simple tasks

Regular expressions are one of the most powerful tools in computing I have ever seen. My previous post about Django compressor and image preloading is a good example how useful they might be. The only limit of their use is your imagination. But "with great power, comes great responsibility" or in this case a great cost. Even the simplest expressions can be quite heavy compared with other methods.

The reason to write about this is a question recently asked in a python group. It was about how to get the elements of a list that match specific string. My proposal was to use comprehension list and simple string comparison while other member proposed using a regular expression. I was pretty sure that the regular expression is slower but not sure exactly how much slower so I made a simple test to find out.

import re
import timeit

my_list = ['abc-123', 'def-456', 'ghi-789', 'abc456', 'abc', 'abd']

def re_check():
    return [i for i in my_list if re.match('^abc$', i)]

t = timeit.Timer(re_check)
print 're_check result >>', re_check()
print "%.2f usec/pass" % (1000000 * t.timeit(number=100000)/100000)

def simple_check():
    return [i for i in my_list if i=='abc']

t = timeit.Timer(simple_check)
print 'simple_check result >>', simple_check()
print "%.2f usec/pass" % (1000000 * t.timeit(number=100000)/100000)

The results was 23.99 vs 1.41 usec/pass respectively for regular expression vs direct comparison i.e. the regexp was 17 times slower. The difference from the example above may be OK in some cases but it rises with the size of the list. This is a simple example how something really quick on local version may take significant time on production and even to broke your application.

So, should you learn and use regular expressions?
Yes! Absolutely!

They are powerful and useful. They will open your mind and allow you to do things you haven't done before. But remember that they are a double-edge razor and should be used cautiously. If you can avoid it with other comparison(one or more) just run a quick test to see whether it will be faster. Of course if you can not avoid it you can also think about caching the results.

Django compressor and image preloading

Published at July 30, 2012 | Tagged with: , , , , , ,

Preface: Have you noticed how on some websites when you click on a link that opens a lightbox or any overlay for first time it takes some time to display the border/background/button images. Not quite fancy, right?
This is because the load of this images starts at the moment the overlay is rendered on the screen. If this is your first load and these images are not in your browser cache it will take some time for the browser to retrieve them from the server.

Solution: The solution for this is to preload the images i.e. to force the browser to request them from the server before they are actually used. With a simple javascript function and a list of the images URLs this is a piece of cake:

$.preLoadImages = function() {
    var args_len = arguments.length;
    for (var i=0; i < args_len; i++) {
        var cacheImage = document.createElement('img');
        cacheImage.src = arguments[i];
    }
}

$.preLoadImages('/img/img1.png', '/img/img2.png')
Please have in mind that the code above uses the jQuery library. Specialty: Pretty easy, but you have to hardcode the URLs of all images. Also if you are using Django compressor then probably you are aware that it adds extra hash to the URLs of the images in the compressed CSS files. The hash depends from the COMPRESS_CSS_HASHING_METHOD settings and can not be avoided. It is pretty useful cause it forces the client browser to reload the images every time when something has been changed. unfortunately our hardcoded list of URLs does not have this hash. So wouldn't it be much simpler if instead of hardcoding URLs we just read them from the CSS files? Solution 2:
$.preLoadImages = function() {
    $.get($('link[rel="stylesheet"]')[0].href, function(data){
        r = /url\(['|"]?(\S+\.(gif|jpg|jpeg|png)[^'(]*)['|"]?\)/ig;
        while (match = r.exec(data)){
                var cacheImage = document.createElement('img');
                cacheImage.src = match[1];
            }
        });
}

$.preLoadImages()

Now with the help of regular expressions we can read the image URLs directly from the CSS file together with the hash part. Please note the zero index in the css file selector, if your main CSS is not the first declared style-sheet then you will have to change the index according to its position.

I hope you will find this solutions simple and useful. As always feel free to comment, share and propose code improvements.

HTTP Status Codes Site

Published at Feb. 1, 2012 | Tagged with: , , , , , ,

During the development of Simple Site Checker I realised that it would be useful for test purposes if there is a website returning all possible HTTP status codes. Thanks to Google App Engine and webapp2 framework building such website was a piece of cake.

The site can be found at http://httpstatuscodes.appspot.com.

The home page provides a list of all HTTP status codes and their names and if you want to get an HTTP response with a specific status code just add the code after the slash, example:
http://httpstatuscodes.appspot.com/200 - returns 200 OK
http://httpstatuscodes.appspot.com/500 - returns 500 Internal Server Error
Also at the end of each page is located the URL of the HTTP protocol Status Codes Definitions with detailed explanation for each one of them.

The website code is publicly available in github at HTTP Status Codes Site.

If you find it useful feel free to comment and/or share it.

Language redirects for multilingual sites with Django CMS ...

Published at Sept. 11, 2011 | Tagged with: , , , , , , , ,

... or how to avoid duplicate content by keeping the current language in the URL

Preface: Earlier this year I posted about Django CMS 2.2 features that I want to see and one of the things mentioned there was that once you have chosen the language of the site there is no matter whether you will open "/my_page/" or "/en/my_page/" - it just shows the same content. The problem is that this can be considered both duplicate and inconsistent content.
Duplicate because you see the same content with and without the language code in the URL and inconsistent because for the same URL you can get different language versions i.e. different content.

Solution: This can be easy fixed by using a custom middleware that will redirect the URL that does not contain language code. In my case the middleware is stored in "middleware/URLMiddlewares.py"(the path is relative to my project root directory) and contains the following code.

from cms.middleware.multilingual import MultilingualURLMiddleware 
from django.conf import settings
from django.http import HttpResponseRedirect
from django.utils import translation

class CustomMultilingualURLMiddleware(MultilingualURLMiddleware): 
    def process_request(self, request):
        lang_path = request.path.split('/')[1]
        if lang_path in settings.URLS_WITHOUT_LANGUAGE_REDIRECT:
            return None
        language = self.get_language_from_request(request) 
        translation.activate(language) 
        request.LANGUAGE_CODE = language
        if lang_path == '': 
            return HttpResponseRedirect('/%s/' % language)
        if len([z for z in settings.LANGUAGES if z[0] == lang_path]) == 0:
            return HttpResponseRedirect('/%s%s' % (language, request.path))
Now a little explanation on what happens in this middleware. Note: If you are not familiar with how middlewares work go and check Django Middlewares. Back to the code. First we split the URL by '/' and take the second element(this is where our language code should be) and store in lang_path(8). URLS_WITHOUT_LANGUAGE_REDIRECT is just a list of URLs that should not be redirected, if lang_path matches any of the URLs we return None i.e. the request is not changed(9-10). This is used for sections of the site that are not language specific for example media stuff. Then we get language based on the request(11-13). If lang_path is empty then the user has requested the home page and we redirect him to the correct language version of it(14-15). If lang_path does not match any of the declared languages this mean that the language code is missing from the URL and the user is redirected to the correct language version of this page(16-17). To make the middleware above to work you have to update your settings.py. First add the middleware to your MIDDLEWARE_CLASSES - in my case the path is 'middleware.URLMiddlewares.CustomMultilingualURLMiddleware'. Second add URLS_WITHOUT_LANGUAGE_REDIRECT list and place there the URLs that should not be redirected, example:
URLS_WITHOUT_LANGUAGE_REDIRECT = [
    'css',
    'js',
]
Specialties: If the language code is not in the URL and there is no language cookie set your browser settings will be used to determine your preferred language. Unfortunately most of the users do not know about this option and it often stays set to its default value. If you want this setting to be ignored just add the following code after line 10 in the middleware above:
if request.META.has_key('HTTP_ACCEPT_LANGUAGE'):
    del request.META['HTTP_ACCEPT_LANGUAGE']

It removed the HTTP_ACCEPT_LANGUAGE header sent from the browser and Django uses the language set in its settings ad default.

URLS_WITHOUT_LANGUAGE_REDIRECT is extremely useful if you are developing using the built in dev server and serve the media files trough it. But once you put your website on production I strongly encourage you to serve these files directly by the web server instead of using Django static serve.

Final words: In Django 1.4 there will be big changes about multilingual URLs but till then you can use this code will improve your website SEO. Any ideas of improvement will be appreciated.

Retrieving Google Analytics data with Python...

Published at Sept. 6, 2011 | Tagged with: , , , , , , ,

... or how to pull data about page visits instead of implementing custom counter

Preface: OK, so you have a website, right? And you are using Google Analytics to track your page views, visitors and so on?(If not you should reconsider to start using it. It is awesome, free and have lost of features as custom segments, map overlay, AdSense integration and many more.)
So you know how many people have visited your each page of your website, the bounce rate, the average time they spend on the page etc. And this data is only for you or for a certain amount whom you have granted access.

Google Analytics

Problem: But what happens if one day you decided to show a public statistic about visitors on your website. For example: How many people have opened the "Product X" page?
Of course you can add a custom counter that increases the views each time when the page is open. Developed, tested and deployed in no time. Everyone is happy until one day someones cat took a nap on his keyboard and "accidentally" kept the F5 button pressed for an hour. The result is simple - one of you pages has 100 times more visits than the other. OK, you can fix this with adding cookies, IP tracking etc. But all this is reinventing the wheel. You already have all this data in your Google Analytics, the only thing you have to do is to stretch hand and take it.

Solution: In our case "the hand" will be an HTTP request via the Google Data API. First you will need to install the Python version of the API:

sudo easy_install gdata
Once you have the API installed you have to build a client and authenticate:
SOURCE_APP_NAME = 'The-name-of-you-app'
my_client = gdata.analytics.client.AnalyticsClient(source=SOURCE_APP_NAME)
my_client.client_login(
    'USERNAME',
    'PASSWORD',
    source=SOURCE_APP_NAME,
    service=my_client.auth_service,
    account_type = 'GOOGLE',
)

token = my_client.auth_token
SOURCE_APP_NAME is the name of the application that makes the request. You can set it to anything you like. After you build the client(2) you must authenticate using your Google account(3-9). If you have both Google and Google APPs account with the same username be sure to provide the correct account type(8). Now you have authenticated and it is time to build the request. Obviously you want to filter the data according some rules. The easiest way is to use the Data Feed Query Explorer to build your filter and test it and then to port it to the code. Here is an example how to get the data about the page views for specific URL for a single month(remember to update the PROFILE_ID according to your profile).
account_query = gdata.analytics.client.AccountFeedQuery()
data_query = gdata.analytics.client.DataFeedQuery({
    'ids': 'ga:PROFILE_ID',
    'dimensions': '', #ga:source,ga:medium
    'metrics': 'ga:pageviews',
    'filters': 'ga:pagePath==/my_url_comes_here/',
    'start-date': '2011-08-06',
    'end-date': '2011-09-06',
    'prettyprint': 'true',
    })

feed = my_client.GetDataFeed(data_query)
result = [(x.name, x.value) for x in feed.entry[0].metric]

Final words: As you see it is relatively easy to get the data from Google but remember that this code makes two request to Google each time it is executed. So you will need to cache the result. The GA data is not real-time so you may automate the process to pull the data(if I remember correctly the data is updated once an hour) and store the results at your side which will really improve the speed. Also have in mind that this is just an example how to use the API instead of pulling the data page by page(as show above) you may pull the results for multiple URLs at once and compute the feed to get your data. It is all in your hands.
You have something to add? Cool I am always open to hear(read) you comments and ideas.

Update: If you are using Django you should consider to use it Memcached to cache these result as shown in Caching websites with Django and Memcached

I become a PhD Student

Published at March 14, 2011 | Tagged with: , , , , , , ,

I am happy to say that I successfully passed the entry tests for a PhD student. So now I am part of SULSIT(State University of Library Studies and Information Technologies) PhD program. The program`s name is "Automated Systems for Information processing and information management" and my thesis will be "Research of the current methods and technologies for web sites and web application development". The dissertation have to be in Bulgarian but I will try to POST the table of content translated as well with a short resume for each chapter.
As always any ideas and suggestions are welcome.

Django CMS 2.2 features that I want to see

Published at Feb. 28, 2011 | Tagged with: , , ,

... this is the "Post on request" for February 2011

Preface: I have to admit that I was expecting a bigger interest in the "Post on request" topic but probably my blog is too young for this but I think I will try again(soon or not). The more important thing is that Jonas Obrist is the indisputable winner of this month "contest".

Features: One of the most useful features in the next Django CMS must be the ability to copy placeholder's content between different language version of one page. For example imagine that you have a home page with several placeholder each with several plugins/snippets inside(latest news, featured products etc.) when creating new language version of the page it is really annoying when you have to add this one by one.

One other feature requested by my colleague Miro is a little opposite of the one I want, he want to be able to add different page templates for each language version of the page. I also find this useful because sometimes your language version are not fully mirrored even on the same page.

Bugs: I am not sure that this is actually a bug but I think it is bad for SEO so I will mark it. If you have a multilingual website, once you have the language set in the cookie the same page is displayed no matter whether you have the language code in the URL. For example, "/en/news/" equals to "/news/". This causes a duplicate content which is considered bad for SEO and also is misleading because every visit of "/news/" with different language in the cookie return different content. I have done some fix for this that will be presented in the next post.

Conclusion: Thanks to all participants and to the guys at Django CMS - you do a really amazing job, I hope that you will like the features I proposed and that we will be able to see them in the next version. Comments and replies are welcomed as ever.

Next page ⋙