Working with intervals in Python

Published at Feb. 22, 2015

Brief: Working with intervals in Python is really easy, fast and simple. If you want to learn more just keep reading.

Task description: Lets say that the case if the following, you have multiple users and each one of them has achieved different number of points on your website. So you want, to know how many users haven't got any point, how many made between 1 and 50 points, how many between 51 and 100 etc. In addition at 1000 the intervals start increasing by 100 instead of 50.

Preparing the intervals: Working with lists in Python is so awesome, so creating the intervals is quite a simple task.

intervals = [0] + \  # The zero intervals
            [x * 50 for x in range(1, 20)] + \  # The 50 intervals
            [x * 100 for x in range(10, 100)] + \  # The 100 intervals
            [x * 1000 for x in range(10, 102)]  # the 1000 intervals

So after running the code above we will have a list with the maximum number of points for each interval. Now it is time to prepare the different buckets that will store the users count. To ease this we are going to use a defaultdict.

from collections import defaultdict

buckets = defaultdict(lambda: 0)

This way, we can increase the count for each bucket without checking if it exists. Now lets got to counting

for user in users:
    try:
        bucket = intervals[bisect.bisect_left(intervals, user.points)]
    except IndexError:
        # we are over the last bucket, so we put in in it
        bucket = intervals[-1]

buckets[bucket] += 1

How it works: Well it is quite simple. The bisect.bisect_left uses a binary search to estimate the position where an item should be inserted to keep the list, in our case intervals sorted. Using the position we take the value from the invervals that represent the bucker where the specified number should go. And we are ready. The result will looks like:

    {
        1: 10,
        10: 5,
        30: 8,
        1100: 2
    }

Final words: As you see when the default dict is used it does not have values for the empty buckets. This can be good or bad depending from the requirements how to present the data but it can be esily fixed by using the items from the intervals as keys for the buckets.

P.S. Comments and ideas for improvement are always welcome.