Friday, March 29, 2013

More Python 2.7 Unicode strangeness

Apparently in Python 2.7 you can't mix Unicode strings with UTF-8 encoded:

>>> x = "sdds"
>>> y = u'\u2013t'.encode('utf-8')
>>> y
'\xe2\x80\x93t'
>>> x = u"sdds"
>>> x + y
Traceback (most recent call last):
  File "", line 1, in 
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 0: ordinal not in range(128)

But you can do:

>>> x.encode('utf-8') + y
'sdds\xe2\x80\x93t'

No comments:

Post a Comment