The road to hell is paved with regular expressions ...

Published at July 31, 2012 | Tagged with: , ,

... or what is the cost of using regular expressions for simple tasks

Regular expressions are one of the most powerful tools in computing I have ever seen. My previous post about Django compressor and image preloading is a good example how useful they might be. The only limit of their use is your imagination. But "with great power, comes great responsibility" or in this case a great cost. Even the simplest expressions can be quite heavy compared with other methods.

The reason to write about this is a question recently asked in a python group. It was about how to get the elements of a list that match specific string. My proposal was to use comprehension list and simple string comparison while other member proposed using a regular expression. I was pretty sure that the regular expression is slower but not sure exactly how much slower so I made a simple test to find out.

import re
import timeit

my_list = ['abc-123', 'def-456', 'ghi-789', 'abc456', 'abc', 'abd']

def re_check():
    return [i for i in my_list if re.match('^abc$', i)]

t = timeit.Timer(re_check)
print 're_check result >>', re_check()
print "%.2f usec/pass" % (1000000 * t.timeit(number=100000)/100000)

def simple_check():
    return [i for i in my_list if i=='abc']

t = timeit.Timer(simple_check)
print 'simple_check result >>', simple_check()
print "%.2f usec/pass" % (1000000 * t.timeit(number=100000)/100000)

The results was 23.99 vs 1.41 usec/pass respectively for regular expression vs direct comparison i.e. the regexp was 17 times slower. The difference from the example above may be OK in some cases but it rises with the size of the list. This is a simple example how something really quick on local version may take significant time on production and even to broke your application.

So, should you learn and use regular expressions?
Yes! Absolutely!

They are powerful and useful. They will open your mind and allow you to do things you haven't done before. But remember that they are a double-edge razor and should be used cautiously. If you can avoid it with other comparison(one or more) just run a quick test to see whether it will be faster. Of course if you can not avoid it you can also think about caching the results.