Monday, August 16, 2010

Using the upload_to() parameter inside Django ImageFields..

There are a bunch of different blog entries that talk about using the upload_to parameter inside Django:

http://adil.2scomplement.com/2009/01/django-playing-with-upload_to-field/

http://scottbarnham.com/blog/2007/07/31/uploading-images-to-a-dynamic-path-with-django/

But what are the implications and some pitfalls when using upload_to as a callable function? This section will cover a major one that I uncovered when experimenting with it for the past several days.

For instance:
def get_image_path(instance, filename):

    if instance.gallery is None:
        return os.path.join('uploads/images', filename)
    else:
        return os.path.join('uploads/images', instance.gallery.slug, filename)

class Image(BaseFileModel):
    file = models.ImageField(upload_to=get_image_path)
Now supposed we create a file called /data/hg/apapa-caktus/apapa/media/tmp/test and then ran the following Python code at the interpreter:
>>> from django.core.files.base import ContentFile
>>> from myproject.models import Image
>>> i = Image()
>>> file_path = '/data/hg/apapa-caktus/apapa/media/tmp/test'
>>> fh = ContentFile(open(file_path, 'r').read())
>>> i.file.save(file_path, fh)
>>> i.file
<ImageFieldFile: /data/hg/apapa-caktus/apapa/media/tmp/test_1>
You'll notice that the entire absolute directory and filename got stored. What we really expected was to see the filename get stored as 'uploads/images/tmp/test_1'.

Why does this happen? We don't see this issue when using a static field inside ImageField().
class Image(BaseFileModel):
    file = models.ImageField(upload_to='media/uploads')
So why do we get this problem when we're using a callable function? There are 2 reasons why this happens. First, the os.path.join() will treat any parameter with the absolute path to be the start of the join (see http://docs.python.org/library/os.path.html).
If any component is an absolute path, all previous components (on Windows, including the previous drive letter, if there was one) are thrown away, and joining continues.
So if we pass in an absolute directory with the parameter, our get_image_path() is invoked will end up returning the full pathname. So essentially, we're invoking os.path.join similar to the following:
>>> os.path.join('dsds', 'dsdsds', '/tmp/test')
'/tmp/test'
The second reason requires understanding how upload_to() works. If you peer inside django.db.models.fields.files.py, which is where ImageField defined, the upload_to() function gets used instead of the default self.generate_filename() function:
def __init__(self, verbose_name=None, name=None, upload_to='', storage=None, **kwargs):
        for arg in ('primary_key', 'unique'):
            if arg in kwargs:
                raise TypeError("'%s' is not a valid argument for %s." % (arg, self.__class__))

        self.storage = storage or default_storage
        self.upload_to = upload_to
        if callable(upload_to):
            self.generate_filename = upload_to
One thing to note about generate_filename() is that it also invokes get_filename(), which in turn invokes os.path.basename(filename). What os.path.basename() does is strip out any absolute directory paths, thereby returning the filename.
def get_filename(self, filename): 
        return os.path.normpath(self.storage.get_valid_name(os.path.basename(filename)))

    def generate_filename(self, instance, filename):
        return os.path.join(self.get_directory_name(), self.get_filename(filename))
In other words, by replacing our own function get_image_path(), we must also use os.path.basename() too!

So our function should look like:
def get_image_path(instance, filename):

    filename = os.path.basename(filename)

    if instance.gallery is None:
        return os.path.join('uploads/images', filename)
    else:
        return os.path.join('uploads/images', instance.gallery.slug, filename)
Hopefully you will avoid the pitfalls that I encountered! Remember: the os.path.basename() is critical for how Django generates upload directories properly, and callable functions that are used in upload_to may also need it too.

2 comments:

  1. do you know any way to pass an absolute path to `upload_to`, instead of one which will be prefixed with `MEDIA_ROOT`?

    ReplyDelete
  2. Have you tried creating a custom upload_to function (as described here?)

    ReplyDelete