Archive for March, 2010

Django based startup – YouTellMe.nl

YouTellMe.nl is a completely Django based startup located in Amsterdam. We’re probably one of the largest Django projects in terms of codebase. Pretty soon the famous nextweb awards are coming up. (remember those guys which broke into Michael Arrington’s house?). And we need some love from the Django community. Lots of it.

Pretty please nominate youtellme.nl for the next web!
http://dsa.thenextweb.com/?lang=nl

Good articles coming soon to offset the bad karma for this shameless plug ;)

Update:

A couple of stats about the current codebase.

According to wc (including whitespace and comments) we currently have:

  • Python: 3,363,188 characters, 87926 lines, 655 files
  • Javascript: 940,536 characters, 24229 lines, 77 files

Django & Events & Web Development & YouTellMe tschellenbach 19 Mar 2010 3 Comments

Django query set iterator – for really large, querysets

When you try to iterate over a query set with about 0.5 million items (a few hundred megs of db storage), the memory usage can become somewhat problematic. Adding .iterator to your query set helps somewhat, but still loads the entire query result into memory. Cronjobs at YouTellMe.nl where unfortunately starting to fail. My colleague Rick came up with the following fix.

This solution chunks up the querying in bits of 1000 (by default). While this is somewhat heavier on your database (multiple queries) it seriously reduces the memory usage. Curious to hear how other django developers have worked around this problem.

import gc

def queryset_iterator(queryset, chunksize=1000):
    '''
    Iterate over a Django Queryset ordered by the primary key

    This method loads a maximum of chunksize (default: 1000) rows in it's
    memory at the same time while django normally would load all rows in it's
    memory. Using the iterator() method only causes it to not preload all the
    classes.

    Note that the implementation of the iterator does not support ordered query sets.
    '''
    pk = 0
    last_pk = queryset.order_by('-pk')[0].pk
    queryset = queryset.order_by('pk')
    while pk < last_pk:
        for row in queryset.filter(pk__gt=pk)[:chunksize]:
            pk = row.pk
            yield row
        gc.collect()

#Some Examples:
#old
MyItem.objects.all()

#better
MyItem.objects.all().iterator()

#even better
queryset_iterator(MyItem.objects.all())

Django snippet here.

Django & Python & Web Development & YouTellMe tschellenbach 03 Mar 2010 288 Comments