Posts tagged with Development

Simple Site Checker

Published at Jan. 30, 2012 | Tagged with: , , , , , ,

... a command line tool to monitor your sitemap links

I was thinking to make such tool for a while and fortunately I found some time so here it is.

Simple Site Checker is a command line tool that allows you to run a check over the links in you XML sitemap.

How it works: The script requires a single attribute - a URL or relative/absolute path to xml-sitemap. It loads the XML, reads all loc-tags in it and start checking the links in them one by one.
By default you will see no output unless there is an error - the script is unable to load the sitemap or any link check fails.
Using the verbosity argument you can control the output, if you need more detailed information like elapsed time, checked links etc.
You can run this script through a cron-like tool and get an e-mail in case of error.

I will appreciate any user input and ideas so feel free to comment.

Faking attributes in Python classes...

Published at Jan. 30, 2012 | Tagged with: , ,

... or how to imitate dynamic properties in a class object

Preface: When you have connections between your application and other systems frequently the data is not in the most useful form for your needs. If you have an API it is awesome but sometimes it just does not act the way you want and your code quickly becomes a series of repeating API calls like api.get_product_property(product_id, property).
Of course it will be easier if you can use objects to represent the data in you code so you can create something like a proxy class to this API:

class Product(object):
    def __init__(self, product_id):
        self.id = product_id

    @property
    def name(self):
        return api_obj.get_product_property(self.id, 'name')

    @property
    def price(self):
        return api_obj.get_product_property(self.id, 'price')

#usage
product = Product(product_id)
print product.name
In my opinion it is cleaner, more pretty and more useful than the direct API calls. But still there is something not quite right. Problem: Your model have not two but twenty properties. Defining 20 method makes the code look not that good. Not to mention that amending the code every time when you need a new property is quite boring. So is there a better way? As I mention at the end of Connecting Django Models with outer applications if you have a class that plays the role of a proxy to another API or other data it may be easier to overwrite the __getattr__ method. Solution:
class Product(object):
    def __init__(self, product_id):
        self.id = product_id

    def __getattr__(self, key):
        return api_obj.get_product_property(self.id, key)

#usage
product = Product(product_id)
print product.name

Now you can directly use the product properties as attribute names of the Product class. Depending from the way that the API works it would be good to raise AttributeError if there is no such property for the product.

Connecting Django Models with outer applications

Published at Jan. 23, 2012 | Tagged with: , , , ,

Preface: Sometimes, parts of the data that you have to display in your application reside out of the Django models. Simple example for this is the following case - the client requires that you build them a webshop but they already have CRM solution that holds their products info. Of course they provide you with a mechanism to read this data from their CRM.

Specialty: The problem is that the data in their CRM does not hold some of the product information that you need. For instance it misses SEO-friendly description and product image. So you will have to set up a model at your side and store these images there. It is easy to join them, the only thing that you will need is a simple unique key for every product.

Solution: Here we use the product_id field to make the connection between the CRM data and the Django model.

# in models.py
class Product(models.Model):
    product_id = models.IntegerField(_('Original Product'),
                                     unique=True)
    description = models.TextField(_('SEO-friendly Description'),
                                   blank=True)
    pod_image = FilerImageField(verbose_name=_('Product Image'),
                                blank=True, null=True)

   @property
   def name(self):
       return crm_api.get_product_name(self.product_id)

# in forms.py
class ProductForm(forms.ModelForm):
    name = forms.CharField(required=False,
                           widget=forms.TextInput(attrs={
                               'readonly': True,
                               'style': 'border: none'}))

    class Meta:
        model = Product
        widgets = {
            'product_id': forms.Select(),
        }
    def __init__(self, *args, **kwargs):
        super(ProductForm, self).__init__(*args, **kwargs)
        self.fields['product_id'].widget.choices = crm_api.get_product_choices()
        if self.instance.id:
            self.fields['name'].initial = self.instance.name

The form here should be used in the admin(add/edit) page of the model. We define that the product_id field will use the select widget and we use a method that connect to the CRM and returns the product choices list.
The "self.instance.id" check is used to fill the name field for product that are already saved.

Final words: This is a very simple example but its idea is to show the basic way to connect your models with another app. I strongly recommend you to use caching if your CRM data is not modified very often in order to save some bandwidth and to speed up your application.
Also if you have multiple field it may be better to overwrite the __getattr__ method instead of defining separate one for each property that you need to pull from the outer application.

P.S. Thanks to Miga for the code error report.

Python os.stat times are OS specific ...

Published at Nov. 21, 2011 | Tagged with: , , , ,

... no problem cause we have stat_float_times

Preface: recently I downloaded PyTDDMon on my office station and what a surprise - the files changes did not affect on the monitor. So I run a quick debug to check what is going on: the monitor was working running a check on every 5 seconds as documented but it does not refresh on file changes.

Problem: it appeared that the hashing algorithm always returns the same value no matter whether the files are changed or not. The calculation is based on three factors: file path, file size and time of last modification. I was changing a single file so the problem was in the last one. Another quick check and another surprise - os.stat(filename).st_mtime returns different value after the files is changed but the total hash stays unchanged. The problem was caused from st_mtime returning floating point value instead of integer.

Solution: Unfortunately the os.stat results appeared to be OS specific. Fortunately the type of the returned result is configurable. Calling os.stat_float_times(False) in the main module force it to return integer values.

Final words: these OS/version/settings things can always surprise you. And most of the time it is not a pleasant surprise. Unfortunately testing the code on hundred of different workstations is not an option. But this is one of the good sides of the open source software. You download it, test it, debug it and have fun.
Also special thanks to Olof(the man behind the pytddmon concept) for the tool, the quick response to my messages and for putting my fix so fast in the repo. I hope that this has saved someone nerves )

P.S. If you are using Python and like TDD - PYTDDMON is the right tool for you.

Django CMS Plugins with selectable template ...

Published at Oct. 9, 2011 | Tagged with: , , , , ,

... or how to reuse your plugins inside sections with different design

Problem: Frequently on the websites I am developing I need to display same set of data in several different ways. For example if I have a news box that needs to appear in different sections of the website e.i. in sidebar, main content etc. Using Django CMS plugins make this quite easy.
For simplicity we will take the following case. An image/text tuple with two layout variations - image on left of text and image on right.

Django CMS Plugins

Same data but different layout. All you need to do is just to allow your users to change the plugin template according to their needs. If you don't have experience with Django CMS Plugins I advice you to check how to create custom Django CMS Plugins before you continue with solution.

Solution: First you will have to create a tuple holding your templates(and their human readable names) and add a field that will hold the chosen template to the plugin model.

#models.py
PLUGIN_TEMPLATES = (
  ('image_on_left.html', 'Image on left'),
  ('image_on_right.html', 'Image on right'),
)

class SamplePlugin(CMSPlugin):
    # your plugin properties here
    template = models.CharField('Template', max_length=255,
                                choices = PLUGIN_TEMPLATES)
Now it is time to tweak the template render method too:
#cms_plugins.py
class CMSSamplePlugin(CMSPluginBase):
    model = SamplePlugin
    name = 'Sample plugin'
    render_template = PLUGIN_TEMPLATES[0][0]
    
    def render(self, context, instance, placeholder):
        if instance and instance.template:
            self.render_template = instance.template
        #your stuff here
        return context

Final words: Yep, this is all. Simple isn't it? It is amazing how sometimes such small things are so useful. If you are having bigger difference in the layout of your templates you will probably have to put a little more stuff in the context that some of your templates may not need but it is OK. Feel free to comment and if you are using this "trick" please add your use case - it will be interesting to see in how many different cases this works.

Language redirects for multilingual sites with Django CMS ...

Published at Sept. 11, 2011 | Tagged with: , , , , , , , ,

... or how to avoid duplicate content by keeping the current language in the URL

Preface: Earlier this year I posted about Django CMS 2.2 features that I want to see and one of the things mentioned there was that once you have chosen the language of the site there is no matter whether you will open "/my_page/" or "/en/my_page/" - it just shows the same content. The problem is that this can be considered both duplicate and inconsistent content.
Duplicate because you see the same content with and without the language code in the URL and inconsistent because for the same URL you can get different language versions i.e. different content.

Solution: This can be easy fixed by using a custom middleware that will redirect the URL that does not contain language code. In my case the middleware is stored in "middleware/URLMiddlewares.py"(the path is relative to my project root directory) and contains the following code.

from cms.middleware.multilingual import MultilingualURLMiddleware 
from django.conf import settings
from django.http import HttpResponseRedirect
from django.utils import translation

class CustomMultilingualURLMiddleware(MultilingualURLMiddleware): 
    def process_request(self, request):
        lang_path = request.path.split('/')[1]
        if lang_path in settings.URLS_WITHOUT_LANGUAGE_REDIRECT:
            return None
        language = self.get_language_from_request(request) 
        translation.activate(language) 
        request.LANGUAGE_CODE = language
        if lang_path == '': 
            return HttpResponseRedirect('/%s/' % language)
        if len([z for z in settings.LANGUAGES if z[0] == lang_path]) == 0:
            return HttpResponseRedirect('/%s%s' % (language, request.path))
Now a little explanation on what happens in this middleware. Note: If you are not familiar with how middlewares work go and check Django Middlewares. Back to the code. First we split the URL by '/' and take the second element(this is where our language code should be) and store in lang_path(8). URLS_WITHOUT_LANGUAGE_REDIRECT is just a list of URLs that should not be redirected, if lang_path matches any of the URLs we return None i.e. the request is not changed(9-10). This is used for sections of the site that are not language specific for example media stuff. Then we get language based on the request(11-13). If lang_path is empty then the user has requested the home page and we redirect him to the correct language version of it(14-15). If lang_path does not match any of the declared languages this mean that the language code is missing from the URL and the user is redirected to the correct language version of this page(16-17). To make the middleware above to work you have to update your settings.py. First add the middleware to your MIDDLEWARE_CLASSES - in my case the path is 'middleware.URLMiddlewares.CustomMultilingualURLMiddleware'. Second add URLS_WITHOUT_LANGUAGE_REDIRECT list and place there the URLs that should not be redirected, example:
URLS_WITHOUT_LANGUAGE_REDIRECT = [
    'css',
    'js',
]
Specialties: If the language code is not in the URL and there is no language cookie set your browser settings will be used to determine your preferred language. Unfortunately most of the users do not know about this option and it often stays set to its default value. If you want this setting to be ignored just add the following code after line 10 in the middleware above:
if request.META.has_key('HTTP_ACCEPT_LANGUAGE'):
    del request.META['HTTP_ACCEPT_LANGUAGE']

It removed the HTTP_ACCEPT_LANGUAGE header sent from the browser and Django uses the language set in its settings ad default.

URLS_WITHOUT_LANGUAGE_REDIRECT is extremely useful if you are developing using the built in dev server and serve the media files trough it. But once you put your website on production I strongly encourage you to serve these files directly by the web server instead of using Django static serve.

Final words: In Django 1.4 there will be big changes about multilingual URLs but till then you can use this code will improve your website SEO. Any ideas of improvement will be appreciated.

Retrieving Google Analytics data with Python...

Published at Sept. 6, 2011 | Tagged with: , , , , , , ,

... or how to pull data about page visits instead of implementing custom counter

Preface: OK, so you have a website, right? And you are using Google Analytics to track your page views, visitors and so on?(If not you should reconsider to start using it. It is awesome, free and have lost of features as custom segments, map overlay, AdSense integration and many more.)
So you know how many people have visited your each page of your website, the bounce rate, the average time they spend on the page etc. And this data is only for you or for a certain amount whom you have granted access.

Google Analytics

Problem: But what happens if one day you decided to show a public statistic about visitors on your website. For example: How many people have opened the "Product X" page?
Of course you can add a custom counter that increases the views each time when the page is open. Developed, tested and deployed in no time. Everyone is happy until one day someones cat took a nap on his keyboard and "accidentally" kept the F5 button pressed for an hour. The result is simple - one of you pages has 100 times more visits than the other. OK, you can fix this with adding cookies, IP tracking etc. But all this is reinventing the wheel. You already have all this data in your Google Analytics, the only thing you have to do is to stretch hand and take it.

Solution: In our case "the hand" will be an HTTP request via the Google Data API. First you will need to install the Python version of the API:

sudo easy_install gdata
Once you have the API installed you have to build a client and authenticate:
SOURCE_APP_NAME = 'The-name-of-you-app'
my_client = gdata.analytics.client.AnalyticsClient(source=SOURCE_APP_NAME)
my_client.client_login(
    'USERNAME',
    'PASSWORD',
    source=SOURCE_APP_NAME,
    service=my_client.auth_service,
    account_type = 'GOOGLE',
)

token = my_client.auth_token
SOURCE_APP_NAME is the name of the application that makes the request. You can set it to anything you like. After you build the client(2) you must authenticate using your Google account(3-9). If you have both Google and Google APPs account with the same username be sure to provide the correct account type(8). Now you have authenticated and it is time to build the request. Obviously you want to filter the data according some rules. The easiest way is to use the Data Feed Query Explorer to build your filter and test it and then to port it to the code. Here is an example how to get the data about the page views for specific URL for a single month(remember to update the PROFILE_ID according to your profile).
account_query = gdata.analytics.client.AccountFeedQuery()
data_query = gdata.analytics.client.DataFeedQuery({
    'ids': 'ga:PROFILE_ID',
    'dimensions': '', #ga:source,ga:medium
    'metrics': 'ga:pageviews',
    'filters': 'ga:pagePath==/my_url_comes_here/',
    'start-date': '2011-08-06',
    'end-date': '2011-09-06',
    'prettyprint': 'true',
    })

feed = my_client.GetDataFeed(data_query)
result = [(x.name, x.value) for x in feed.entry[0].metric]

Final words: As you see it is relatively easy to get the data from Google but remember that this code makes two request to Google each time it is executed. So you will need to cache the result. The GA data is not real-time so you may automate the process to pull the data(if I remember correctly the data is updated once an hour) and store the results at your side which will really improve the speed. Also have in mind that this is just an example how to use the API instead of pulling the data page by page(as show above) you may pull the results for multiple URLs at once and compute the feed to get your data. It is all in your hands.
You have something to add? Cool I am always open to hear(read) you comments and ideas.

Update: If you are using Django you should consider to use it Memcached to cache these result as shown in Caching websites with Django and Memcached

Foreign key to Django CMS page ...

Published at July 14, 2011 | Tagged with: , , , , , , , ,

... how to make usable drop-downs with Django CMS pages

Problem: some times when you create custom applications or plugins for Django CMS you need a property that connects the current item to a page in the CMS. Nothing simple than this - you just add a ForeignKey in your model that points to the Page model and everything is (almost)fine. Example:

from cms.models import Page

class MyModel(models.Model):
    # some model attributes here
    page = models.ForeignKey(Page)    
If you registered your model in Django admin or just add a model form to it you will see something like this:

Django Admin Screenshot

Cool right? Not exactly. The problem is that these pages are in hierarchical structure and listing them in a flat list may be/is little confusing. So let's indent them accordingly to their level in the hierarchy. Solution: The easies way to achieve this indentation is to overwrite the choices list of the ForeignKey field in the ModelForm __init__ method.
class MyModelForm(forms.ModelForm):
    class Meta:
        model = MyModel
    
    def __init__(self, *args, **kwargs):
        super(MyModelForm, self).__init__(*args, **kwargs)
        choices = [self.fields['page'].choices.__iter__().next()]
        for page in self.fields['page'].queryset:
            choices.append(
                (page.id, ''.join(['-'*page.level, page.__unicode__()]))
            )
        self.fields['page'].choices = choices 

The magic lies between lines 7 and 11, on line 7 we create a list with one element the default empty option for the drop down. The need to use "__iter__().next()" comes from the fact that the choices attribute of the fields is django.forms.models.ModelChoiceIterator object which is iterable, but not indexable i.e. you can not just use self.fields['url'].choices[0].
After we had the empty choice now it is time to add the real ones, so we iterate over the queryset(8th line) that holds them and for each item we add a tuple to our choices list(10). The first item of the tuple is the page id - nothing special, but the second one... here the python magic comes. We multiple the minus sign('-') by the page level and join the result with the page title. The only thing left is to replace the field choices(line 12) and here is the result:

Django Admin - usable drop-down

Final words: For me this is much more usable than the flat list. Of course you can modify the queryset to return only published pages or filter the results in other way and still use the identation code from above.
I'll be happy to hear your thoughts on this.

Caching websites with Django and Memcached...

Published at July 12, 2011 | Tagged with: , , , , , , ,

... memcached installation, django configuration and how to use it for your website

After Caching web sites and web applications and When to use caching for web sites its time for a little sample. This sample shows the usage of Memcached for server-side application cache. The installation part is taken from my Ubuntu so it may differ depending from your OS/distribution.

What is Memcached: Memcached is a tool that allows you to store key-value pairs in you memory. The keys are limited to 250 Bytes and for better performance the value size is limited to 1MB(more details) but this size is fair enough for web usage.

Memcached installation:

apt-get install memcached
apt-get install python-memcache
The first line installs Memcached and the second one install Python API for communication between your application and Memcached daemon. After this the Memcached daemon is up and running. With default configuration it runs on port 11211 on localhost(127.0.0.1). If you want to modify this the configuration file(in my case) is situated in /etc/memcached.conf Django configuration: This one depends from the Django version that you use. For 1.2.5 and prior the next code should by added in your settings file(settings.py):
CACHE_BACKEND = 'memcached://127.0.0.1:11211/'
For 1.3 and development version add:
CACHES = {
    'default': {
        'BACKEND': 'django.core.cache.backends.memcached.MemcachedCache',
        'LOCATION': '127.0.0.1:11211',
    }
}
In both cases if you use different port and/or IP you have to replace them above. More info about cache backend configuration you can find in Django documentation docs. So now you have Memcached running and Django configured. If you have doubts about is this suitable/usable in you case take a look at the posts mentioned above or just add comment with your case and I will be happy to give you an advice. Now it is time to start using it. Cache usage(part I) - how to cache on Python level: If you have some heavy calculations in your view you can cache the result from this and use the calculated one to lower the load. Example:
from django.core.cache import cache

def heavy_view(request):
    cache_key = 'my_heavy_view_cache_key'
    cache_time = 1800 # time to live in seconds
    result = cache.get(cache_key)
    if not result:
        result = # some calculations here
        cache.set(cache_key, result, cache_time)
    return result
The process is simple, you ask the cache for a value corresponding to a given key(line 4). If the result is None you execute the code that generates it(line 8 ) and store it in the cache(line 9). My advice is to declare the key and time as variables cause this will ease their future changes. Cache usage(part II) - how to cache on template level: This is suitable for the cases when you have some heavy processing in the template(as regroup) or you want to cache only part of the template(as latest news section). Example:
{% load cache %}
 ... non cached content here ...
{% cache 1800 latest_news %}
    ... here are latest news - cached ...
{% endcache %}
The basic usage usage is {% cache time_in_seconds key %} ... {% endcache %} You can also cache code fragments based on dynamic properties, for example - current user recent conversations, just pass a 3rd param the uniquely identifies the code to be cached.
{% load cache %}
{% cache 300 recent_conversations request.user.id %}
    ... current user recent conversations - cached ...
{% endcache %}

Final words: as you see from the examples above using Django and Memcached is really easy. Using it correctly will speed up your website and respectively improve your user experience(UX) and SEO. Using it wrong will provide negative results. Just take a moment and think what can be cached, how long can it be cached and is there a reason to be cached. Try to avoid double caching - there is no need to use caching in templates and then cache the rendered template in the view too.

When to use caching for web sites ...

Published at July 12, 2011 | Tagged with: , , , , , , ,

... five major question to ask yourself before using cache

After we learned about Why, Where, What and How of caching web sites now it is time to see when to use it.
The application cache is mainly used for speeding up and/or decreasing the load of frequently executed and(but not necessary) heavy resource using methods. So the first question is:

1) If you have method the consumes lots of CPU time and/or memory can it be optimised?

If you can optimize your code and make the method run faster and consume less resource than do this first and then reconsider whether you still need to cache it..

2) Will caching save you load/wait?

Have in mind that accessing cache has its own load. So caching the result of relatively light operations is pointless. Try to find where your biggest load/wait came from and use cache there.

3) For how long the cache should be valid?

This depend from how often the data is changed. We can split it in 3 major cases.

Case 1: The data changes on equal amounts of time, for example it is updated by cron. In this case you can set the expire time equal to the time interval in which the cron is runned. Example - you are reading a feed with the news from the last hour.

Case 2: The data is changed on random intervals of time - but not real time(i.e. from minutes to hours). In this case you should choose an average amount of time for which you think the data is persistent or even if it is changed you can serve the outdated one to the client. In this case the best way is to be able to invalidate cache when data changes. This is usable if the data is from the same type, for example - news list. Just add a line that invalidates cache on every news affecting operation(add/edit/delete). If you have composite data, for example mix of news, weather cast and currency rates for the day you'll just have to wait the cache to expire by it self.

Case 3: The data is updated real-time(every second, sometimes more than once in a second). If you are watching/playing on the market you really need real-time info for the current rates. But if don`t need the real-time date, for example you are displaying just informative graph with the daily changes you can cache and update it rarely(for example on every few minutes) then it changes.

So you have determined how long your cache should stay valid and the next question is:

4) Will the data be accessed more than once for the cache period?

If not then don't cache it or reconsider to increase cache validity time(question 2).

5) How big is the data to be cached?

Memcached has 1MB limit per cached item. For web(no matter is it site or application) this limit is fair enough for most/all of the cases. If you plan to store something bigger thing again and be sure you are not doing something wrong. If you really need to cache bigger amount of data consider to use another cache storage - database or file system.

If you are an advanced developer you will subconsciously know whether to use cache or not. But this questions may be really helpful if you are a novice.
I am open to hear what you ask yourself before using cache.

Update:

One more case when it is mandatory to use caching(if you can) is when you have frequent calls to an outer API. For example Retrieving Google Analytics Data.

⋘ Previous page Next page ⋙