Hus to Know?: August 2010

Tuesday, August 31, 2010

Django's permalink decorator

Django's documentation includes a brief section about the permalink decorator:

http://docs.djangoproject.com/en/1.2/ref/models/instances/

from django.db import models

@models.permalink
def get_absolute_url(self):
    return ('people_view', [str(self.id)])

So what does permalink actually do? Actually, the best way is to look inside the __init__.py file stored inside the django.db.models directory:

def permalink(func):
    """                                                                                                                                                                 
    Decorator that calls urlresolvers.reverse() to return a URL using                                                                                                 
    parameters returned by the decorated function "func".                                                                                                               
                                                                                                                                                                        
    "func" should be a function that returns a tuple in one of the                                                                                                      
    following formats:                                                                                                                                                  
        (viewname, viewargs)                                                                                                                                            
        (viewname, viewargs, viewkwargs)                                                                                                                                
    """
    from django.core.urlresolvers import reverse
    def inner(*args, **kwargs):
        bits = func(*args, **kwargs)
        return reverse(bits[0], None, *bits[1:3])
    return inner

So essentially permalink() is a fancy way to invoke Django's reverse() function..it relies on a decorator pattern (If you want to understand decorators/closures really, see Matt Harrison's PyCon talk at http://python.mirocommunity.org/video/1625/pycon-2010-the-meaty-parts-of- for a really good explanation.

But basically, the function translates to:

def permalink(function get absolute_url):
  def inner(*args, **kwargs):
    bits = get_absolute_url('people_view', [str(self.id)])
    return reverse('people_view', None, str(self.id))

The *bits[1:3] expression will flatten out the list into separate list arguments. Again, watch Matt Harrison's PyCon talk since you'll get a snapshot of his 10 years of Python experience wrapped up in a 1 hour talk...

Monday, August 30, 2010

Using Django's photologue and preserving previous image timestamps

If you've ever had to use Django's photologue module for hosting photo galleries, one of the features it exposes is the ability to add to an existing gallery by attaching .ZIP files. The function is called process_zipfile().The source for the models.py code can be found here:

http://code.google.com/p/django-photologue/source/browse/trunk/photologue/models.py

The process_zipfile() works by using the Python zipfile library, and it iterates through the sorted list of entries in the zipfile. It reads the data by invoking a function called read(), and then wraps the data around the Django ContentFile() object, which is used by ImageField and FileField objects to save the data to disk.

for filename in sorted(zip.namelist()):
                if filename.startswith('__'): # do not process meta files
                    continue
                data = zip.read(filename)
                if len(data):
                    try:
                        # the following is taken from django.newforms.fields.ImageField:
                        #  load() is the only method that can spot a truncated JPEG,
                        #  but it cannot be called sanely after verify()
                        trial_image = Image.open(StringIO(data))
                        trial_image.load()
                        # verify() is the only method that can spot a corrupt PNG,
                        #  but it must be called immediately after the constructor
                        trial_image = Image.open(StringIO(data))
                        trial_image.verify()
                    except Exception:
                        # if a "bad" file is found we just skip it.
                        continue
                    while 1:
                        title = ' '.join([self.title, str(count)])
                        slug = slugify(title)
                        try:
                            p = Photo.objects.get(title_slug=slug)
                        except Photo.DoesNotExist:
                            photo = Photo(title=title,
                                          title_slug=slug,
                                          caption=self.caption,
                                          is_public=self.is_public,
                                          tags=self.tags)
                            photo.image.save(filename, ContentFile(data))
                            gallery.photos.add(photo)
                            count = count + 1
                            break
                        count = count + 1

But what if you wanted to preserve the original datetime stamps from each image?

One way to do it is to extract the last modified time from the original file stored in the .ZIP file (using the getinfo() class and invoking it with the filename), and then using time.mktime() to convert this date_time attribute to a format that can be used by the os.utime() function.

image.file.save(file_name, ContentFile(data))

                # http://twistedmatrix.com/trac/changeset/19275/branches/dir-1951/twisted/python/zippath.py                                                            
                # Change timestamp to that set in the .ZIP file                                                                                                        
                zip = self.zfile.getinfo(file_name)
                mtime = time.mktime(zip.date_time + (0,0,0))

                os.utime(os.path.join(settings.MEDIA_ROOT, image.file.name), (mtime, mtime))

Please note, by making these changes, it's likely you will also need to include the following headers in your imports:

from django.conf import settings
import time
import zipfile

Friday, August 27, 2010

The ultimate one-liner to ZIP a bunch of different directories up into separate files...

Suppose you have a bunch of directories that you want to ZIP up, each having individual ZIP files. You also want to collapse any subdirectories within those directories.

This blog entry says to never use ls and pipe it to awk. The reason? If you have any directories that have spaces in them, awk will not parse them correctly. For more info, see here:

http://stackoverflow.com/questions/1447809/awk-print-9-the-last-ls-l-column-including-any-spaces-in-the-file-name

Eventually, I settled on this line:

find . -maxdepth 1 -type d -print | awk '{if ($0 == ".") next; print "zip -r -j \""$0".zip\" \""$0"\""}' | sh

The "-type d" means to search for only directories. I pipe the result to awk, which skips the listing if it's a "." directory listing. Notice also I use $0 with awk -- since the default delimeter is a space, any directories with spaces will get parsed into $1.

I use the "-r" option to scan the directories recursively and the "-j" to junk the directory name.
In addition, I add surrounding quotation marks for both the zip file and directory (\") to avoid Unix parsing issues (in case the directories have spaces in them).

I then pipe the final result to 'sh', which will execute the statement.

If you're in doubt, you can remove the final sh pipe ("| sh") to see how this command works for you!

Wednesday, August 25, 2010

How Python's unittest library figures out what functions to run..

Django's django.test.TestCase ultimately inherits from the Python unittest library, so I was curious to figure out how Python actually discerns what test functions to run. For instance, how does unittest.TestCase() ultimately know that there are two test methods (test_details(), and test_index()) to invoke?

from django.test import TestCase

class SimpleTest(TestCase):
    def test_details(self):
        response = self.client.get('/customer/details/')
        self.failUnlessEqual(response.status_code, 200)

    def test_index(self):
        response = self.client.get('/customer/index/')
        self.failUnlessEqual(response.status_code, 200)

Well, we can start by tracing into the django.test.TestCase code:

django.test.testcases.py:

class TestCase(TransactionTestCase):

...inherits:

class TransactionTestCase(unittest.TestCase):

..
unittest.py (Python library):

class TestCase:
    """A class whose instances are single test cases.

    By default, the test code itself should be placed in a method named
    'runTest'.

When you invoke python manage.py test, Django calls django.test.simple.DjangoTestSuiteRunner, which inherits from unittest.TextTestRunner. Inside django.test.simple.py, there is this following line:

suite.addTest(unittest.defaultTestLoader.loadTestsFromModule(test_module))

The actual instantiation is declared inside unittest.py:

defaultTestLoader = TestLoader()

The loadTestsfromModule() first get invokes, which in turn invokes getTestCaseNames():

unit.test.py:

def loadTestsFromTestCase(self, testCaseClass):
        """Return a suite of all tests cases contained in testCaseClass"""
        if issubclass(testCaseClass, TestSuite):
            raise TypeError("Test cases should not be derived from TestSuite. Maybe you meant to derive from TestCase?")
        testCaseNames = self.getTestCaseNames(testCaseClass)
        if not testCaseNames and hasattr(testCaseClass, 'runTest'):
            testCaseNames = ['runTest']
        return self.suiteClass(map(testCaseClass, testCaseNames))

The method getTestCaseNames() that searches for all methods that begin with "test_" and are callable:

def getTestCaseNames(self, testCaseClass):
        """Return a sorted sequence of method names found within testCaseClass
        """
        def isTestMethod(attrname, testCaseClass=testCaseClass, prefix=self.testMethodPrefix):
            return attrname.startswith(prefix) and callable(getattr(testCaseClass, attrname))
        testFnNames = filter(isTestMethod, dir(testCaseClass))
        for baseclass in testCaseClass.__bases__:
            for testFnName in self.getTestCaseNames(baseclass):
                if testFnName not in testFnNames:  # handle overridden methods
                    testFnNames.append(testFnName)
        if self.sortTestMethodsUsing:
            testFnNames.sort(self.sortTestMethodsUsing)
        return testFnNames

So if you invoked the method directly through the Python interpreter, you can see how this function works:

>>> import unittest
>>> b = unittest.TestLoader()
>>> b.getTestCaseNames(SimpleTest)
['test_details', 'test_index']

Test-driven development with Django ImageField

Suppose you had the following model:

admin.py:

class ImageGalleryAdminForm(forms.ModelForm):
    archive = forms.FileField(label="ZIP archive", required=False)

    class Meta:
        model = get_model('uploads', 'ImageGallery')

models.py:
class Image(BaseFileModel):
    file = models.ImageField(upload_to=get_image_path)
    gallery = models.ForeignKey(ImageGallery, default=get_default_gallery)

How would you write test cases to verify that you could actually upload a ZIP archive? Well, here's how the Django documentation describes things:

http://docs.djangoproject.com/en/dev/topics/testing/

Submitting files is a special case. To POST a file, you need only provide the file field name as a key, and a file handle to the file you wish to upload as a value. For example:

>>> c = Client()
>>> f = open('wishlist.doc')
>>> c.post('/customers/wishes/', {'name': 'fred', 'attachment': f})
>>> f.close()
(The name attachment here is not relevant; use whatever name your file-processing code expects.)

Why can you pass a file handler inside a dictionary? You have to dig into the Django django.test.client code to understand why it's even possible.

django.test.client:

    def post(self, path, data={}, content_type=MULTIPART_CONTENT,
             follow=False, **extra):
        """                                                                                                                                                             
        Requests a response from the server using POST.                                                                                                                 
        """
        if content_type is MULTIPART_CONTENT:
            post_data = encode_multipart(BOUNDARY, data)

Notice that content_type is by default set to MULTIPART_CONTENT. It invokes the function encode_multipart(), which has these particular important lines:

def encode_multipart(boundary, data):
    """                                                                                                                                                                 
    Encodes multipart POST data from a dictionary of form values.  

    The key will be used as the form data name; the value will be transmitted                                                                                           
    as content. If the value is a file, the contents of the file will be sent                                                                                           
    as an application/octet-stream; otherwise, str(value) will be sent.                                                                                                 
    """
.
.
.
    # Not by any means perfect, but good enough for our purposes.                                                                                                       
    is_file = lambda thing: hasattr(thing, "read") and callable(thing.read)
    for (key, value) in data.items():
        if is_file(value):
            lines.extend(encode_file(boundary, key, value))

Django creates a lambda function that determines if the file handler you pass-in has a read() function. If it does, it will consider the dictionary value as a file. It will then invoke encode_file() and create a MIME-encoded submission.

Try it out for yourself at the Python interpreter:

python manage.py shell

>>> from django.test.client import BOUNDARY
>>> f = open("/tmp/dragon.zip")
>>> django.test.client.encode_multipart(BOUNDARY, {'d' : f})

You should start to see something like the following:
'--BoUnDaRyStRiNg\r\nContent-Disposition: form-data; name="d"; filename="dragon.zip"\r\nContent-Type: application/octet-stream\r\n\r\nPK\x03\x04\x14\x00\x00\x00\x08\x00\'G\xbc<\xcb\x99\x05\xbb:\xaa\x00\x00\xb8\xaa\x00\x00\n\x00\x00\x00dragon.jpg\

Monday, August 23, 2010

Django v1.2 and test fixtures

I started using Django test fixtures with unit tests and noticed one interesting issue with the following code:

import unittest

from django.contrib.admin.sites import LOGIN_FORM_KEY
from django.test.client import Client

class SimpleTest(unittest.TestCase):

    fixtures = ['auth.json']

    def setUp(self):
        # Every test needs a client.
        self.client = Client()

        self.super_login = {
                     LOGIN_FORM_KEY: 1,
                     'username': 'myadminuser',
                     'password': 'testing'}

    def testLogin(self):

        from django.contrib.auth.models import User
        print str(User.objects.all())
        
        # Issue a GET request.
        url = '/admin/'
        response = self.client.get(url)
        self.failUnlessEqual(response.status_code, 200)         # Check that the response is 200 OK.

        print str(self.super_login)
        login = self.client.post(url, self.super_login)
        print str(login)

Before running this test code, I created a fixture by dumping out all the users in the Auth table and storing them into a file stored in the fixtures/ dir of the apapa.events module:

./manage.py dumpdata --indent=2 auth > apapa/events/fixtures/auth.json

I then created this test case to try to see if the fixture would load:

./manage.py test apapa.events

I observed that the User table was not populated correctly. The moment though I renamed the apapa/events/fixtures/auth.json to apapa/events/fixtures/initial_data.json, I see this message:

Installing json fixture 'initial_data' from '/data/hg/apapa-caktus/apapa/events/fixtures'.
Installed 1 object(s) from 1 fixture(s)
[]

What's wrong? Well, it turns out whether your unit test code inherits from the Python unittest or the django.test library matters a lot for the fixtures declaration in your code to be used:

from django.test.client import Client
from django.test import TestCase

class SimpleTest(TestCase):

    fixtures = ['auth.json']

The TestCase class in django.test has this code that calls the Django management commands to load the fixture into the database. The Python unittest library, on the other hand, doesn't. Therein lies the issue if you create a unit test case that derives from the wrong class. In this case, you should inherit from the django.test test case to take advantage of the fixtures imports!

for db in databases:
            if hasattr(self, 'fixtures'):
                call_command('loaddata', *self.fixtures, **{
                                                            'verbosity': 0,
                                                            'commit': False,
                                                            'database': db
                                                            })

This thread tipped me off to the issue:
http://groups.google.com/group/django-users/browse_thread/thread/3c1a6933e8b6f4fd/dc7e16a0f5cd337e?show_docid=dc7e16a0f5cd337e

Background info: The initial_data is used to populate a database when you first run manage.py syncdb. Take a look at this thread for more info:

http://www.mail-archive.com/django-updates@googlegroups.com/msg55739.html

Monday, August 16, 2010

Using the upload_to() parameter inside Django ImageFields..

There are a bunch of different blog entries that talk about using the upload_to parameter inside Django:

http://adil.2scomplement.com/2009/01/django-playing-with-upload_to-field/

http://scottbarnham.com/blog/2007/07/31/uploading-images-to-a-dynamic-path-with-django/

But what are the implications and some pitfalls when using upload_to as a callable function? This section will cover a major one that I uncovered when experimenting with it for the past several days.

For instance:

def get_image_path(instance, filename):

    if instance.gallery is None:
        return os.path.join('uploads/images', filename)
    else:
        return os.path.join('uploads/images', instance.gallery.slug, filename)

class Image(BaseFileModel):
    file = models.ImageField(upload_to=get_image_path)

Now supposed we create a file called /data/hg/apapa-caktus/apapa/media/tmp/test and then ran the following Python code at the interpreter:

>>> from django.core.files.base import ContentFile
>>> from myproject.models import Image
>>> i = Image()
>>> file_path = '/data/hg/apapa-caktus/apapa/media/tmp/test'
>>> fh = ContentFile(open(file_path, 'r').read())
>>> i.file.save(file_path, fh)
>>> i.file
<ImageFieldFile: /data/hg/apapa-caktus/apapa/media/tmp/test_1>

You'll notice that the entire absolute directory and filename got stored. What we really expected was to see the filename get stored as 'uploads/images/tmp/test_1'.

Why does this happen? We don't see this issue when using a static field inside ImageField().

class Image(BaseFileModel):
    file = models.ImageField(upload_to='media/uploads')

So why do we get this problem when we're using a callable function? There are 2 reasons why this happens. First, the os.path.join() will treat any parameter with the absolute path to be the start of the join (see http://docs.python.org/library/os.path.html).

If any component is an absolute path, all previous components (on Windows, including the previous drive letter, if there was one) are thrown away, and joining continues.

So if we pass in an absolute directory with the parameter, our get_image_path() is invoked will end up returning the full pathname. So essentially, we're invoking os.path.join similar to the following:

>>> os.path.join('dsds', 'dsdsds', '/tmp/test')
'/tmp/test'

The second reason requires understanding how upload_to() works. If you peer inside django.db.models.fields.files.py, which is where ImageField defined, the upload_to() function gets used instead of the default self.generate_filename() function:

def __init__(self, verbose_name=None, name=None, upload_to='', storage=None, **kwargs):
        for arg in ('primary_key', 'unique'):
            if arg in kwargs:
                raise TypeError("'%s' is not a valid argument for %s." % (arg, self.__class__))

        self.storage = storage or default_storage
        self.upload_to = upload_to
        if callable(upload_to):
            self.generate_filename = upload_to

One thing to note about generate_filename() is that it also invokes get_filename(), which in turn invokes os.path.basename(filename). What os.path.basename() does is strip out any absolute directory paths, thereby returning the filename.

def get_filename(self, filename): 
        return os.path.normpath(self.storage.get_valid_name(os.path.basename(filename)))

    def generate_filename(self, instance, filename):
        return os.path.join(self.get_directory_name(), self.get_filename(filename))

In other words, by replacing our own function get_image_path(), we must also use os.path.basename() too!

So our function should look like:

def get_image_path(instance, filename):

    filename = os.path.basename(filename)

    if instance.gallery is None:
        return os.path.join('uploads/images', filename)
    else:
        return os.path.join('uploads/images', instance.gallery.slug, filename)

Hopefully you will avoid the pitfalls that I encountered! Remember: the os.path.basename() is critical for how Django generates upload directories properly, and callable functions that are used in upload_to may also need it too.