Posts tagged with Python

Django project file structure

Published at Sept. 24, 2012 | Tagged with: , ,

As I promised in Automated deployment with Ubuntu, Fabric and Django I will use this post to explain the file structure that I use for my Django projects and what I benefit from it. So here is my project directory tree.

The structure

~/workspace/<project_name>/
|-- bin
|-- include
|-- lib
|-- local
|-- src
   |-- .git
   |-- .gitignore
   |-- required_packages.txt
   |-- media
   |-- static
   |-- <project_name>
   |   |-- <project_name>
   |   |   |-- __init__.py
   |   |   |-- settings
   |   |   |   |-- __init__.py
   |   |   |   |-- <environment_type>.py
   |   |   |   |-- local.py
   |   |   |-- templates
   |   |   |-- urls.py
   |   |   |-- views.py
   |   |-- manage.py
   |   |-- wsgi.py
   |-- <project_name>.development.nginx.local.conf
   |-- <project_name>.< environment_type>.nginx.uwsgi.conf
   |-- <project_name>.< environment_type>.uwsgi.conf

Explanation

At the top I have a directory named as the project and virtual environment inside of it. The benefit from it is complete isolation of the project from the surrounding projects and python packages installed at OS level and ability to install packages without administrator permissions. It also provides an easy way to transfer the project from one system to another using a requirements file.
The src folder is where I keep everything that is going to enter the version control project source, requirements files, web server configuration etc.
My default .gitignore is made to skip the pyc-files, the PyDev files and everything in the static and media directories.
The media directory is where the MEDIA_ROOT settings point to, respectively static is for the STATIC_ROOT.
All required packages with their version are placed in required_packages.txt so we can install/update them with a single command in the virtual environment.
In a directory with the name of the project is where the python code resides. Inside it the project structure partly follows the new project layout introduced in Django 1.4.
The big difference here is the settings part. It is moved as a separate module where all common/general settings are place in __init__.py

. We have one file for each possible environment(development, staging, production) with the environment specific settings like DEBUG and TEMPLATE_DEBUG. In local.py are placed the settings specific for the machine where the project is running. This file should not go into the repository as this is where the access database password and API keys should reside.
I having one local Nginx configuration file because I use the webserver to serve the static files when working locally this is the <project_name>.development.nginx.local.conf.
For each environment there is also a couple of configuration files - one for the web server(<project_name>.< environment_type>.nginx.uwsgi.conf) and one for the uwsgi(<project_name>.< environment_type>.uwsgi.conf). I make symbolic links pointing to these files so any changes made are automatically pushed/pulled via version control, I only have to reload the configuration. There is no option to change something in the configuration at one station and to forget to transfer it to the rest.

Benefits

  • Complete isolation from other projects using virtual environment
  • Easy to transfer and update packages on other machines thanks to the pip requirements file
  • Media/static files outside the source directory for higher security
  • Web server/uWSGI configuration as part of the repository for easier and error proof synchronization

Probably it have some downside(I can not think any of these now but you never know) so if you think that this can be improved feel free to share your thoughts. Also if there is anything not clear enough, just ask me, I will be happy to clear it.

Automated deployment with Ubuntu, Fabric and Django

Published at Sept. 18, 2012 | Tagged with: , , , , , ,

A few months ago I started to play with Fabric and the result was a simple script that automates the creation of a new Django project. In the last months I continued my experiments and extended the script to a full stack for creation and deployment of Django projects.

As the details behind the script like the project structure that I use and the server setup are a bit long I will keep this post only on the script usage and I will write a follow up one describing the project structure and server.

So in a brief the setup that I use consist of Ubuntu as OS, Nginx as web server and uWSGI as application server. The last one is controlled by Upstart. The script is available for download at GitHub.
In a wait for more detailed documentation here is a short description of the main tasks and what they do:

startproject:<project_name>

  • Creates a new virtual environment
  • Installs the predefined packages(uWSGI, Django and South))
  • Creates a new Django project from the predefined template
  • Creates different configuration files for development and production environment
  • Initializes new git repository
  • Prompts the user to choose a database type. Then installs database required packages, creates a new database and user with the name of the project and generates the local settings file
  • Runs syncdb(if database is selected) and collectstatic

setup_server

  • installs the required packages such as Nginx, PIP, GCC etc
  • prompts for a database type and install it
  • reboots the server as some of the packages may require it

Once you have a ready server and a project just use this the following task to deploy it to the server. Please have in mind that it should be used only for initial deployment.

deploy_project:<project_name>,<env_type>,<project_repository>

  • Creates a new virtual environment
  • Clones the project from the repository
  • Installs the required packages
  • Creates symbolic links to the nginx/uwsgi configuration files
  • Asks for database engine, creates new DB/User with the project name and updates the settings file
  • Calls the update_project task

update_project:<project_name>,<env_type>

  • Updates the source code
  • Installs the required packages(if there are new)
  • Runs syncdb, migrate and collect static
  • Restart the Upstart job that runs the uWSGI and the Nginx server

The script is still in development so use it on your own risk. Also it reflects my own idea of server/application(I am planning to describe it deeper in a follow up post) setup but I would really like if you try it and give a feedback. Feel free to fork, comment and advice.

The road to hell is paved with regular expressions ...

Published at July 31, 2012 | Tagged with: , ,

... or what is the cost of using regular expressions for simple tasks

Regular expressions are one of the most powerful tools in computing I have ever seen. My previous post about Django compressor and image preloading is a good example how useful they might be. The only limit of their use is your imagination. But "with great power, comes great responsibility" or in this case a great cost. Even the simplest expressions can be quite heavy compared with other methods.

The reason to write about this is a question recently asked in a python group. It was about how to get the elements of a list that match specific string. My proposal was to use comprehension list and simple string comparison while other member proposed using a regular expression. I was pretty sure that the regular expression is slower but not sure exactly how much slower so I made a simple test to find out.

import re
import timeit

my_list = ['abc-123', 'def-456', 'ghi-789', 'abc456', 'abc', 'abd']

def re_check():
    return [i for i in my_list if re.match('^abc$', i)]

t = timeit.Timer(re_check)
print 're_check result >>', re_check()
print "%.2f usec/pass" % (1000000 * t.timeit(number=100000)/100000)

def simple_check():
    return [i for i in my_list if i=='abc']

t = timeit.Timer(simple_check)
print 'simple_check result >>', simple_check()
print "%.2f usec/pass" % (1000000 * t.timeit(number=100000)/100000)

The results was 23.99 vs 1.41 usec/pass respectively for regular expression vs direct comparison i.e. the regexp was 17 times slower. The difference from the example above may be OK in some cases but it rises with the size of the list. This is a simple example how something really quick on local version may take significant time on production and even to broke your application.

So, should you learn and use regular expressions?
Yes! Absolutely!

They are powerful and useful. They will open your mind and allow you to do things you haven't done before. But remember that they are a double-edge razor and should be used cautiously. If you can avoid it with other comparison(one or more) just run a quick test to see whether it will be faster. Of course if you can not avoid it you can also think about caching the results.

Python is not a Panacea ...

Published at June 11, 2012 | Tagged with: , , , ,

... neither is any other language or framework

This post was inspired by the serial discussion on the topic "Python vs other language"(in the specific case the other one was PHP, and the question was asked in a Python group so you may guess whether there are any answers in favor of PHP). It is very simple, I believe that every Python developer will tell you that Python is the greatest language ever build, how easy is to learn it, how readable and flexible it is, how much fun it is to work with it and so on. They will tell you that you can do everything with it: web and desktop development, testing, automation, scientific simulations etc. But what most of them will forgot to tell you is that it is not a Panacea.

In the matter of fact you can also build "ugly" and unstable applications in Python too. Most problems come not from the language or framework used, but from bad coding practices and bad understanding of the environment. Python will force you to write readable code but it wont solve all your problems. It is hard to make a complete list of what exactly you must know before starting to build application, big part of the knowledge comes with the experience but here is a small list of some essential things.

  • Write clean with meaningful variable/class names.
  • Exceptions are raised, learn how to handle them.
  • Learn OOP(Object Oriented Programming)
  • Use functions to granulate functionality and make code reusable
  • DRY(Don't Repeat Youtself)
  • If you are going do develop web application learn about the Client-Server relation
  • Use "layers" to seprate the different parts of your application - database methods, business logic, output etc. MVC is a nice example of such separation
  • Never store passwords in plain text. Even hashed password are not completely safe, check what Rainbow Tables are.
  • Comment/Document your code
  • Write unit test and learn TDD.
  • Learn how to use version control.
  • There is a client waiting on the other side - don't make him wait too long.
  • Learn functional programming.

I hope the above does not sounds as an anti Python talk. This is not its idea. Firstly because there are things that are more important than the language itself(the list above) and secondly because... Python is awesome )))
There are languages that will help you learn the things above faster, Python is one of them - built in documentation features, easy to learn and try and extremely useful. My advice is not to start with PHP as your first programming language it will make you think that mixing variables with different types is OK. It may be fast for some things but most of the times it is not safe so you should better start with more type strict language where you can learn casting, escaping user output etc.

Probably I have missed a few(or more) pointa but I hope I've covered the basics. If you think that anything important is missing, just add it in the comments and I will update the post.

Fabric & Django

Published at May 28, 2012 | Tagged with: , , ,

Or how automate the creation of new projects with simple script

Preface: Do you remember all this tiny little steps that you have to perform every time when you start new project - create virtual environment, install packages, start and setup Django project? Kind of annoying repetition, isn't it? How about to automate it a bit.

Solution: Recently I started learning Fabric and thought "What better way to test it in practice than automating a simple, repetitive task?". So, lets mark the tasks that I want the script to perform:

  1. Create virtual environment with the project name
  2. Activate the virtual environment
  3. Download list of packages and install them
  4. Make 'src' directory where the project source will reside
  5. Create new Django project in source directory
  6. Update the settings

Thanks to the local command the first one was easy. The problem was with the second one. Obviously each local command is run autonomously so I had to find some way have activated virtual environment for each task after this. Fortunately the prefix context manager works like a charm. I had some issues making it read and write in the paths I wants and voilà it was working exactly as I want.

The script is too long to place it here but is publicly available at https://gist.github.com/2818562
It is quite simple to use, you only need python, fabric and virtual environment. Then just use the following code.

fab start_project:my_mew_project

To Do: Here are few things that can be improved:

  • Read the packages from a file
  • Update urls.py to enable admin
  • Generate Nginx server block file

So this is my first try with Fabric, I hope that you will like it and find it useful. As always any comments, questions and/or improvement ideas are welcome.

Simple Site Checker

Published at Jan. 30, 2012 | Tagged with: , , , , , ,

... a command line tool to monitor your sitemap links

I was thinking to make such tool for a while and fortunately I found some time so here it is.

Simple Site Checker is a command line tool that allows you to run a check over the links in you XML sitemap.

How it works: The script requires a single attribute - a URL or relative/absolute path to xml-sitemap. It loads the XML, reads all loc-tags in it and start checking the links in them one by one.
By default you will see no output unless there is an error - the script is unable to load the sitemap or any link check fails.
Using the verbosity argument you can control the output, if you need more detailed information like elapsed time, checked links etc.
You can run this script through a cron-like tool and get an e-mail in case of error.

I will appreciate any user input and ideas so feel free to comment.

Faking attributes in Python classes...

Published at Jan. 30, 2012 | Tagged with: , ,

... or how to imitate dynamic properties in a class object

Preface: When you have connections between your application and other systems frequently the data is not in the most useful form for your needs. If you have an API it is awesome but sometimes it just does not act the way you want and your code quickly becomes a series of repeating API calls like api.get_product_property(product_id, property).
Of course it will be easier if you can use objects to represent the data in you code so you can create something like a proxy class to this API:

class Product(object):
    def __init__(self, product_id):
        self.id = product_id

    @property
    def name(self):
        return api_obj.get_product_property(self.id, 'name')

    @property
    def price(self):
        return api_obj.get_product_property(self.id, 'price')

#usage
product = Product(product_id)
print product.name
In my opinion it is cleaner, more pretty and more useful than the direct API calls. But still there is something not quite right. Problem: Your model have not two but twenty properties. Defining 20 method makes the code look not that good. Not to mention that amending the code every time when you need a new property is quite boring. So is there a better way? As I mention at the end of Connecting Django Models with outer applications if you have a class that plays the role of a proxy to another API or other data it may be easier to overwrite the __getattr__ method. Solution:
class Product(object):
    def __init__(self, product_id):
        self.id = product_id

    def __getattr__(self, key):
        return api_obj.get_product_property(self.id, key)

#usage
product = Product(product_id)
print product.name

Now you can directly use the product properties as attribute names of the Product class. Depending from the way that the API works it would be good to raise AttributeError if there is no such property for the product.

Connecting Django Models with outer applications

Published at Jan. 23, 2012 | Tagged with: , , , ,

Preface: Sometimes, parts of the data that you have to display in your application reside out of the Django models. Simple example for this is the following case - the client requires that you build them a webshop but they already have CRM solution that holds their products info. Of course they provide you with a mechanism to read this data from their CRM.

Specialty: The problem is that the data in their CRM does not hold some of the product information that you need. For instance it misses SEO-friendly description and product image. So you will have to set up a model at your side and store these images there. It is easy to join them, the only thing that you will need is a simple unique key for every product.

Solution: Here we use the product_id field to make the connection between the CRM data and the Django model.

# in models.py
class Product(models.Model):
    product_id = models.IntegerField(_('Original Product'),
                                     unique=True)
    description = models.TextField(_('SEO-friendly Description'),
                                   blank=True)
    pod_image = FilerImageField(verbose_name=_('Product Image'),
                                blank=True, null=True)

   @property
   def name(self):
       return crm_api.get_product_name(self.product_id)

# in forms.py
class ProductForm(forms.ModelForm):
    name = forms.CharField(required=False,
                           widget=forms.TextInput(attrs={
                               'readonly': True,
                               'style': 'border: none'}))

    class Meta:
        model = Product
        widgets = {
            'product_id': forms.Select(),
        }
    def __init__(self, *args, **kwargs):
        super(ProductForm, self).__init__(*args, **kwargs)
        self.fields['product_id'].widget.choices = crm_api.get_product_choices()
        if self.instance.id:
            self.fields['name'].initial = self.instance.name

The form here should be used in the admin(add/edit) page of the model. We define that the product_id field will use the select widget and we use a method that connect to the CRM and returns the product choices list.
The "self.instance.id" check is used to fill the name field for product that are already saved.

Final words: This is a very simple example but its idea is to show the basic way to connect your models with another app. I strongly recommend you to use caching if your CRM data is not modified very often in order to save some bandwidth and to speed up your application.
Also if you have multiple field it may be better to overwrite the __getattr__ method instead of defining separate one for each property that you need to pull from the outer application.

P.S. Thanks to Miga for the code error report.

Python os.stat times are OS specific ...

Published at Nov. 21, 2011 | Tagged with: , , , ,

... no problem cause we have stat_float_times

Preface: recently I downloaded PyTDDMon on my office station and what a surprise - the files changes did not affect on the monitor. So I run a quick debug to check what is going on: the monitor was working running a check on every 5 seconds as documented but it does not refresh on file changes.

Problem: it appeared that the hashing algorithm always returns the same value no matter whether the files are changed or not. The calculation is based on three factors: file path, file size and time of last modification. I was changing a single file so the problem was in the last one. Another quick check and another surprise - os.stat(filename).st_mtime returns different value after the files is changed but the total hash stays unchanged. The problem was caused from st_mtime returning floating point value instead of integer.

Solution: Unfortunately the os.stat results appeared to be OS specific. Fortunately the type of the returned result is configurable. Calling os.stat_float_times(False) in the main module force it to return integer values.

Final words: these OS/version/settings things can always surprise you. And most of the time it is not a pleasant surprise. Unfortunately testing the code on hundred of different workstations is not an option. But this is one of the good sides of the open source software. You download it, test it, debug it and have fun.
Also special thanks to Olof(the man behind the pytddmon concept) for the tool, the quick response to my messages and for putting my fix so fast in the repo. I hope that this has saved someone nerves )

P.S. If you are using Python and like TDD - PYTDDMON is the right tool for you.

Retrieving Google Analytics data with Python...

Published at Sept. 6, 2011 | Tagged with: , , , , , , ,

... or how to pull data about page visits instead of implementing custom counter

Preface: OK, so you have a website, right? And you are using Google Analytics to track your page views, visitors and so on?(If not you should reconsider to start using it. It is awesome, free and have lost of features as custom segments, map overlay, AdSense integration and many more.)
So you know how many people have visited your each page of your website, the bounce rate, the average time they spend on the page etc. And this data is only for you or for a certain amount whom you have granted access.

Google Analytics

Problem: But what happens if one day you decided to show a public statistic about visitors on your website. For example: How many people have opened the "Product X" page?
Of course you can add a custom counter that increases the views each time when the page is open. Developed, tested and deployed in no time. Everyone is happy until one day someones cat took a nap on his keyboard and "accidentally" kept the F5 button pressed for an hour. The result is simple - one of you pages has 100 times more visits than the other. OK, you can fix this with adding cookies, IP tracking etc. But all this is reinventing the wheel. You already have all this data in your Google Analytics, the only thing you have to do is to stretch hand and take it.

Solution: In our case "the hand" will be an HTTP request via the Google Data API. First you will need to install the Python version of the API:

sudo easy_install gdata
Once you have the API installed you have to build a client and authenticate:
SOURCE_APP_NAME = 'The-name-of-you-app'
my_client = gdata.analytics.client.AnalyticsClient(source=SOURCE_APP_NAME)
my_client.client_login(
    'USERNAME',
    'PASSWORD',
    source=SOURCE_APP_NAME,
    service=my_client.auth_service,
    account_type = 'GOOGLE',
)

token = my_client.auth_token
SOURCE_APP_NAME is the name of the application that makes the request. You can set it to anything you like. After you build the client(2) you must authenticate using your Google account(3-9). If you have both Google and Google APPs account with the same username be sure to provide the correct account type(8). Now you have authenticated and it is time to build the request. Obviously you want to filter the data according some rules. The easiest way is to use the Data Feed Query Explorer to build your filter and test it and then to port it to the code. Here is an example how to get the data about the page views for specific URL for a single month(remember to update the PROFILE_ID according to your profile).
account_query = gdata.analytics.client.AccountFeedQuery()
data_query = gdata.analytics.client.DataFeedQuery({
    'ids': 'ga:PROFILE_ID',
    'dimensions': '', #ga:source,ga:medium
    'metrics': 'ga:pageviews',
    'filters': 'ga:pagePath==/my_url_comes_here/',
    'start-date': '2011-08-06',
    'end-date': '2011-09-06',
    'prettyprint': 'true',
    })

feed = my_client.GetDataFeed(data_query)
result = [(x.name, x.value) for x in feed.entry[0].metric]

Final words: As you see it is relatively easy to get the data from Google but remember that this code makes two request to Google each time it is executed. So you will need to cache the result. The GA data is not real-time so you may automate the process to pull the data(if I remember correctly the data is updated once an hour) and store the results at your side which will really improve the speed. Also have in mind that this is just an example how to use the API instead of pulling the data page by page(as show above) you may pull the results for multiple URLs at once and compute the feed to get your data. It is all in your hands.
You have something to add? Cool I am always open to hear(read) you comments and ideas.

Update: If you are using Django you should consider to use it Memcached to cache these result as shown in Caching websites with Django and Memcached

⋘ Previous page Next page ⋙