Thursday, January 20, 2011

How Django deals with Unicode

If you've read the Django documentation about Unicode, it reads something like the following:

http://docs.djangoproject.com/en/dev/ref/unicode/

All of Django’s database backends automatically convert Unicode strings into the appropriate encoding for talking to the database. They also automatically convert strings retrieved from the database into Python Unicode strings. You don’t even need to tell Django what encoding your database uses: that is handled transparently.

So what's happening internally for CharField types? Well, it turns out within the fields, smart_unicode() is invokved on the field, converting the value back to a Python unicode type (through get_prep_value() The entire SQL query gets generated as a Unicode object, so then at the very end, if we're using a MySQL back-end, encode(charset) is invoked on the query:

MYSQLdb/cursors.py:

def execute(self, query, args=None):

        """Execute a query.
                                                                                                      
        query -- string, query to execute on server
        args -- optional sequence or mapping, parameters to use with query.                                                                                                                

        Note: If args is a sequence, then %s must be used as the
        parameter placeholder in the query. If a mapping is used,
        %(key)s must be used as the placeholder.                                                                                                                                           

        Returns long integer rows affected, if any                                                                                                                                        

        """
        charset = db.character_set_name()
        if isinstance(query, unicode):
            query = query.encode(charset)

No comments:

Post a Comment