>>> a = u"Hey\u2019t"
>>> b = a.encode('utf-8')
>>> b.encode('utf-8')
Traceback (most recent call last):
File "", line 1, in
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 3: ordinal not in range(128)
The encode('utf-8') takes a Python unicode object and converts it into a Python ASCII string object. When you try to encode a Python string object into UTF-8, Python throws an error above.The best slide talk that discusses these issues can be found here: http://farmdev.com/talks/unicode/
Python 2.x has these problems in general because it has created separate typed objects, u' for unicode, and ' for string (ASCII), both derived from the basestring type. Python 3.0 solves this issue by unifying the string object.
No comments:
Post a Comment