Django, pytz, NonExistentTimeError and AmbiguousTimeError

Published at March 29, 2015 | Tagged with: , , ,

Brief: In one of the project I work on we had to convert some old naive datetime objects to timezone aware ones. Converting naive datetime to timezone aware one is usually a straightforward job. In django you even have a nice utility function for this. For example:

import pytz
from django.utils import timezone


timezone.make_aware(datetime.datetime(2012, 3, 25, 3, 52),
                    timezone=pytz.timezone('Europe/Stockholm'))
# returns datetime.datetime(2012, 3, 25, 3, 52, tzinfo=<DstTzInfo 'Europe/Stockholm' CEST+2:00:00 DST>)

Problem: You can use this for quite a long time until one day you end up with something like this:

timezone.make_aware(datetime.datetime(2012, 3, 25, 2, 52),
                    timezone=pytz.timezone('Europe/Stockholm'))

# which leads to
Traceback (most recent call last):
  File "", line 1, in 
  File "/home/ilian/venvs/test/lib/python3.4/site-packages/django/utils/timezone.py", line 358, in make_aware
    return timezone.localize(value, is_dst=None)
  File "/home/ilian/venvs/test/lib/python3.4/site-packages/pytz/tzinfo.py", line 327, in localize
    raise NonExistentTimeError(dt)
pytz.exceptions.NonExistentTimeError: 2012-03-25 02:52:00

Or this:

    
timezone.make_aware(datetime.datetime(2012, 10, 28, 2, 52),
                    timezone=pytz.timezone('Europe/Stockholm'))

#throws
Traceback (most recent call last):
  File "", line 1, in 
  File "/home/ilian/venvs/test/lib/python3.4/site-packages/django/utils/timezone.py", line 358, in make_aware
    return timezone.localize(value, is_dst=None)
  File "/home/ilian/venvs/test/lib/python3.4/site-packages/pytz/tzinfo.py", line 349, in localize
    raise AmbiguousTimeError(dt)
pytz.exceptions.AmbiguousTimeError: 2012-10-28 02:52:00

Explanation: The reason for the first error is that in the real world this datetime does not exists. Due to the DST change on this date the clock jumps from 01:59 standard time to 03:00 DST. Fortunately (or not) pytz is aware of the fact that this time is invalid and will throw the exception above. The second exception is almost the same but it happens when switching from summer to standard time. From 01:59 DST the clock shifts to 01:00 standard time, so we end with a duplicate time.

Why has this happened(in our case)? Well we couldn't be sure how exactly this one got into our legacy data but the assumption is that at the moment when the record was saved the server has been in different timezone where this has been a valid time.

Solution 1: This fix is quite simple, just add an hour if the exception occurs.

try:
    date = make_aware(
        datetime.fromtimestamp(date_time, timezone=pytz.timezone('Europe/Stockholm'))
    )
except (pytz.NonExistentTimeError, pytz.AmbiguousTimeError):
    date = make_aware(
        datetime.fromtimestamp(date_time) + timedelta(hours=1),
        timezone=pytz.timezone('Europe/Stockholm')
    )

Solution 2: Instead of calling make_aware call timezone.localize directly.

try:
    date = make_aware(
        datetime.fromtimestamp(date_time, timezone=pytz.timezone('Europe/Stockholm'))
    )
except (pytz.NonExistentTimeError, pytz.AmbiguousTimeError):
    timezone = pytz.timezone('Europe/Stockholm')
    date = timezone.localize(datetime.fromtimestamp(date_time), is_dst=False)

The second solution probably needs some explanation. First lets check what make_aware does. The code bellow is take from Django's sourcecode as it is in version 1.7.7

def make_aware(value, timezone):
    """
    Makes a naive datetime.datetime in a given time zone aware.
    """
    if hasattr(timezone, 'localize'):
        # This method is available for pytz time zones.
        return timezone.localize(value, is_dst=None)
    else:
        # Check that we won't overwrite the timezone of an aware datetime.
        if is_aware(value):
            raise ValueError(
                "make_aware expects a naive datetime, got %s" % value)
        # This may be wrong around DST changes!
        return value.replace(tzinfo=timezone)    

To simplify it, what Django does is to use the localize method of the timezone object(if it exists) to convert the datetime. When using pytz this localize method takes two arguments: the datetime value and is_dst. The last argument takes three possible values: None, False and True. When using None and the datetime matches the moment of the DST change pytz does not know how to handle the datetime and you get one of the exceptions shown above. False means that it should convert it to standard time and True that it should convert it to summer time.

Why isn't this fixed in Django? The simple answer is "because this is how it should work". For a bit longer check the respectful ticket.

Reminder: Do not forget that the "fix" above does not actually care whether the original datetime is during DST or not. In our case this was not criticla for our app, but in some other cases it might be, so use it carefully.

Thanks: Special thanks to Joshua who correctly pointed out in the comments that I have missed the AmbiguousTimeError in the original post which made me to look a bit more in the problem, research other solutions and update the article to its current content.

Working with intervals in Python

Published at Feb. 22, 2015

Brief: Working with intervals in Python is really easy, fast and simple. If you want to learn more just keep reading.

Task description: Lets say that the case if the following, you have multiple users and each one of them has achieved different number of points on your website. So you want, to know how many users haven't got any point, how many made between 1 and 50 points, how many between 51 and 100 etc. In addition at 1000 the intervals start increasing by 100 instead of 50.

Preparing the intervals: Working with lists in Python is so awesome, so creating the intervals is quite a simple task.

intervals = [0] + \  # The zero intervals
            [x * 50 for x in range(1, 20)] + \  # The 50 intervals
            [x * 100 for x in range(10, 100)] + \  # The 100 intervals
            [x * 1000 for x in range(10, 102)]  # the 1000 intervals

So after running the code above we will have a list with the maximum number of points for each interval. Now it is time to prepare the different buckets that will store the users count. To ease this we are going to use a defaultdict.

from collections import defaultdict

buckets = defaultdict(lambda: 0)

This way, we can increase the count for each bucket without checking if it exists. Now lets got to counting

for user in users:
    try:
        bucket = intervals[bisect.bisect_left(intervals, user.points)]
    except IndexError:
        # we are over the last bucket, so we put in in it
        bucket = intervals[-1]

buckets[bucket] += 1

How it works: Well it is quite simple. The bisect.bisect_left uses a binary search to estimate the position where an item should be inserted to keep the list, in our case intervals sorted. Using the position we take the value from the invervals that represent the bucker where the specified number should go. And we are ready. The result will looks like:

    {
        1: 10,
        10: 5,
        30: 8,
        1100: 2
    }

Final words: As you see when the default dict is used it does not have values for the empty buckets. This can be good or bad depending from the requirements how to present the data but it can be esily fixed by using the items from the intervals as keys for the buckets.

P.S. Comments and ideas for improvement are always welcome.

Resurection

Published at Feb. 22, 2015

Well, as some of you may have seen this blog was on hold for quite a long time. There were multiple reasons mainly my Ph.D. and changing my job but it is back online.

So, what is new?

As a start this blog is no longer running on wordpress. The reason is that I had some issues with wordpress - the blog was hacked twice due to security holes in wordpress/plugins, it was terribly slow and the code looked like shit. Lots of inline styles and javascript etc. So I made a simple Django based blog that generates static content. Alse we have new design and new domain, the last one much easier to remember )))

Also the comments are now handled by Disquss and the search functinality is provided by Google. The code of the blog, needs some minor cleaning and then it will be released publicly in the next few weeks. Meanwhile you can check my latest post Working with intervals in Python.

P.S. I have finally finished my Ph.D. so no more university/reasearch job and hopefully more time for blogging.

Python Interview Question and Answers

Published at April 28, 2013 | Tagged with:

For the last few weeks I have been interviewing several people for Python/Django developers so I thought that it might be helpful to show the questions I am asking together with the answers. The reason is ... OK, let me tell you a story first.
I remember when one of my university professors introduced to us his professor - the one who thought him. It was a really short visit but I still remember one if the things he said. "Ignorance is not bad, the bad thing is when you do no want to learn."
So back to the reason - if you have at least taken care to prepare for the interview, look for a standard questions and their answers and learn them this is a good start. Answering these question may not get you the job you are applying for but learning them will give you some valuable knowledge about Python.
This post will include the questions that are Python specific and I'll post the Django question separately.

  1. How are arguments passed - by reference of by value?
    The short answer is "neither", actually it is called "call by object” or “call by sharing"(you can check here for more info). The longer one starts with the fact that this terminology is probably not the best one to describe how Python works. In Python everything is an object and all variables hold references to objects. The values of these references are to the functions. As result you can not change the value of the reference but you can modify the object if it is mutable. Remember numbers, strings and tuples are immutable, list and dicts are mutable.
  2. Do you know what list and dict comprehensions are? Can you give an example?
    List/Dict comprehensions are syntax constructions to ease the creation of a list/dict based on existing iterable. According to the 3rd edition of "Learning Python" list comprehensions are generally faster than normal loops but this is something that may change between releases. Examples:
    # simple iteration
    a = []
    for x in range(10):
        a.append(x*2)
    # a == [0, 2, 4, 6, 8, 10, 12, 14, 16, 18]
    
    # list comprehension
    a = [x*2 for x in range(10)]
    
    # dict comprehension
    a = {x: x*2 for x in range(10)}
    # a == {0: 0, 1: 2, 2: 4, 3: 6, 4: 8, 5: 10, 6: 12, 7: 14, 8: 16, 9: 18}
    
  3. What is PEP 8?
    PEP 8 is a coding convention(a set of recommendations) how to write your Python code in order to make it more readable and useful for those after you. For more information check PEP 8.
  4. Do you use virtual environments?
    I personally and most(by my observation) of the Python developers find the virtual environment tool extremely useful. Yeah, probably you can live without it but this will make the work and support of multiple projects that requires different package versions a living hell.
  5. Can you sum all of the elements in the list, how about to multuply them and get the result?
    # the basic way
    s = 0
    for x in range(10):
        s += x
    
    # the right way
    s = sum(range(10))
    
    
    # the basic way
    s = 1
    for x in range(1, 10):
        s = s * x
    
    # the other way
    from operator import mul
    reduce(mul, range(1, 10))
      
    

    As for the last example, I know Guido van Rossum is not a fan of reduce, more info here, but still for some simple tasks reduce can come quite handy.
  6. Do you know what is the difference between lists and tuples? Can you give me an example for their usage?
    First list are mutable while tuples are not, and second tuples can be hashed e.g. to be used as keys for dictionaries. As an example of their usage, tuples are used when the order of the elements in the sequence matters e.g. a geographic coordinates, "list" of points in a path or route, or set of actions that should be executed in specific order. Don't forget that you can use them a dictionary keys. For everything else use lists.
  7. Do you know the difference between range and xrange?
    Range returns a list while xrange returns a generator xrange object which takes the same memory no matter of the range size. In the first case you have all items already generated(this can take a lot of time and memory) while in the second you get the elements one by one e.g. only one element is generated and available per iteration. Simple example of generator usage can be find in the problem 2 of the "homework" for my presentation Functions in Python
  • Tell me a few differences between Python 2.x and 3.x
    There are many answers here but for me some of the major changes in Python 3.x are: all strings are now Unicode, print is now function not a statement. There is no range, it has been replaced by xrange which is removed. All classes are new style and the division of integers now returns float.
  • What are decorators and what is their usage?
    According to Bruce Eckel's Introduction to Python Decorators "Decorators allow you to inject or modify code in functions or classes". In other words decorators allow you to wrap a function or class method call and execute some code before or after the execution of the original code. And also you can nest them e.g. to use more than one decorator for a specific function. Usage examples include - logging the calls to specific method, checking for permission(s), checking and/or modifying the arguments passed to the method etc.
  • The with statement and its usage.
    In a few words the with statement allows you to executed code before and/or after a specific set of operations. For example if you open a file for reading and parsing no matter what happens during the parsing you want to be sure that at the end the file is closed. This is normally achieved using the try... finally construction but the with statement simplifies it usin the so called "context management protocol". To use it with your own objects you just have to define __enter__ and __exit__ methods. Some standard objects like the file object automatically support this protocol. For more information you may check Understanding Python's "with" statement.
  • Well I hope this will be helpful, if you have any question or suggestion feel free to comment.

    Update: Due to the lots of comments on Reddit and LinkedIn, I understood that there is some misunderstanding about the post. First, the questions I have published are not the only ones I ask, the interview also includes such related to general programming knowledge and logical thinking. Second the questions above help me get a basic understanding of your Python knowledge but they are not the only think that makes my decision. Not answering some of them does not mean that you won't get the job, but it may show me on which parts we should to work.

    Django for Web Prototyping

    Published at April 15, 2013 | Tagged with: , , , , , ,

    Or how to use the benefits of Django template system during the PSD to HTML phase

    There are two main approaches to start designing a new project - Photoshop mock-up or an HTML prototype. The first one is more traditional and well established in the web industry. The second one is more alternative and (maybe)modern. I remember a video of Jason Fried from 37 Signals where he talks about design and creativity. You can see it at http://davegray.nextslide.com/jason-fried-on-design. There he explains how he stays away from the Photoshop in the initial phase to concetrate on the things that you can interact with instead of focusing on design details.

    I am not planning to argue which is the better method, the important thing here is that sooner or later you get to the point where you have to start the HTML coding. Unfortunately frequently this happens in a pure HTML/CSS environment outside of the Django project and then we waste some extra amount of time to convert it to Django templates.

    Wouldn't be awesome if you can give the front-end developers something that they can install/run with a simple command and still to allow them to work in the Django environment using all the benefits it provides - templates nesting and including, sekizai tags etc.

    I have been planning to do this for a long time and finally it is ready and is available at Django for Prototyping. Currently the default template includes Modernizr, jQuery and jQuery UI but you can easily modify it according to your needs. I would be glad of any feedback and ideas of improvement so feel free to try it and comment.

    Introduction to Django - presentation

    Published at March 18, 2013 | Tagged with:

    This presentation shows the basics of Django - what is inside the framework and explains the Model-View-Template system. One of the most important parts is a diagram how the request is processed and the response is generated. Shows the project and the application structure and the basic elements - Models, URLs dispatcher, Views and Templates.

    Introduction to django from Ilian Iliev

    Simple Site Checker and the User-Agent header

    Published at Oct. 22, 2012 | Tagged with: , , , ,

    Preface: Nine months ago(I can't believe it was that long) I created a script called Simple Site Checker to ease the check of sitemaps for broken links. The script code if publicly available at Github. Yesterday(now when I finally found time to finish this post it must be "A few weeks ago") I decided to run it again on this website and nothing happened - no errors, no warning, nothing. Setting the output level to DEBUG showed the following message "Loading sitemap ..." and exited.
    Here the fault was mine, I have missed a corner case in the error catching mechanism i.e. when the sitemap URL returns something different from "200 OK" or "500 internal server error". Just a few second and the mistake was fix.

    Problem and Solution: I ran the script again and what a surprise the sitemap URL was returning "403 Forbidden". At the same time the sitemap was perfectly accessible via my browser. After some thinking I remembered about that some security plugins block the access to the website if there is not User-Agent header supplied. The reason for this is to block the access of simple script. In my case even an empty User-Agent did the trick to delude the plugin.

    urllib2.urlopen(urllib2.Request(url,
                                    headers={'User-Agent': USER_AGENT}))
    

    Final words: As a result of the issue mention above one bug in simple site checker was found fixed. At the same time another issue about missing status and progress was raised, more details can be found at Github but in a few words an info message was added to each processed URL to indicate the progress.

    If you have any ideas for improvement or anything else feel free to comment, create issues and/or fork the script.

    Automation, Fabric and Django - presentation

    Published at Oct. 3, 2012 | Tagged with: , , , , ,

    As a follow up post of Automated deployment with Ubuntu, Fabric and Django here are the slides from my presentation on topic "Automation, Fabric and Django". Unfortunately there is no audio podcast but if there is interest I can add some comments about each slide as part of this post.

    Automation - fabric, django and more from Ilian Iliev

    If there is anything that need explanation feel free to ask.

    Next page ⋙