>>> a = u"Hey\u2019t" >>> b = a.encode('utf-8') >>> b.encode('utf-8') Traceback (most recent call last): File "The encode('utf-8') takes a Python unicode object and converts it into a Python ASCII string object. When you try to encode a Python string object into UTF-8, Python throws an error above.", line 1, in UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 3: ordinal not in range(128)
The best slide talk that discusses these issues can be found here: http://farmdev.com/talks/unicode/
Python 2.x has these problems in general because it has created separate typed objects, u' for unicode, and ' for string (ASCII), both derived from the basestring type. Python 3.0 solves this issue by unifying the string object.
No comments:
Post a Comment